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ABSTRACT 


A dynamic and extremely eomplex landseape in seeurity and world events 
presents problems that ehallenge all seetors of soeiety to develop effieient means for 
exploring a wide range of solutions. Similarly, exponential inereases in teehnologieal 
eapability make it diffieult for eommereial and governmental leaders to assess those 
proposed solutions. Computer experimentation is an established method for examining 
eomplex models with large numbers of faetors. Orthogonal and nearly orthogonal Latin 
hypereubes are proven teehniques for designing simulation experiments. A key property 
of these effieient, spaee-filling designs is their ability to explore many faetors within a 
relatively modest number of design points; however, there is a limited inventory of these 
designs eurrently available. Those that have been eatalogued are usually eomputationally 
expensive to produee and have severe restrietions in the number of faetors and/or runs 
that they allow. To remedy this, we present a set of flexible methodologies to ereate 
design matriees with little or no eorrelation—ineluding saturated nearly orthogonal Latin 
hypereubes. This new family of designs ean explore as many faetors as there are design 
points. This researeh also addresses experiments that inelude a mixture of eontinuous 
and integer variables, some of whieh have different numbers of value levels. 
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EXECUTIVE SUMMARY 


Experiments using eomputer models are of great importanee to seientifie researeh, 
national defense, and publie poliey debates. This follows from eontinual improvements 
in eomputational power against a baekdrop of rising eosts and other ehallenges frequently 
assoeiated with physieal experimentation. Moreover, often it is not praetieal to eonduet 
many, or even any, physieal experiments—e.g., the long-term effeets of various polieies 
on global elimate, emergeney response to large-seale nuelear aeeidents, potential major 
military oonfliets, ete. In situations with a dearth of real-world experimental data, 
computer models are frequently used to help understand these complex issues. 

Computer experimentation in the above areas may contain thousands of input 
variables and/or take a long time (e.g., many days) to run (Kleijnen et ah, 2005). 
Researchers have several techniques to effectively extract information from these models. 
Among them are designs of experiments that are specifically developed for efficiently 
exploring high-dimensional computer models. 

Latin hypercube (LH) sampling (McKay et ah, 1979) has proven to be an 
invaluable design technique. In fact, LHs are reported to be the predominant design for 
experiments involving computer simulations (Buyske & Trout, 2001). A key reason for 
this is that they come with minimal restrictions on the number of factors (i.e., input 
variables) and sampling budget. In addition, the resultant output data allow us to fit 
many different models to multiple outputs from a single experimental set. These designs 
permit us to simultaneously screen many factors for significance and to fit very complex 
meta-models (including nonparametric) to a handful of dominant variables. This 
flexibility extends to visual investigations of the data (Sanchez & Lucas, 2002), as we get 
many viewpoints from which to observe the relationships between inputs and outputs. 

There are a large number of LHs for any given experimental condition, by which 
we mean the number of design points (n) and variables (k). A design point is a unique 
combination of the values of the input variables. Some LH designs possess better 
properties than others. Lor example, LHs can have unacceptable correlations among the 
input factors, thus hindering many statistical procedures that we might wish to apply in 
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analyzing the relationships between inputs and outputs. To mitigate this problem, various 
researehers (e.g., Owen, 1994; Ye, 1998; Cioppa, 2002; Ang, 2006; and Steinberg & Lin, 
2006) have developed algorithms to eliminate or signifieantly reduee eorrelations among 
input variables—thereby developing orthogonal and nearly orthogonal Latin hypereubes 
(OLH, NOLH). A design is nearly orthogonal if the maximum absolute pairwise 
eorrelation among the eolumns of the design matrix is < .05. 

The sueeess of these reeent efforts to ereate OLH and NOLH designs is extensive, 
but all of these approaehes are subjeet to stringent eonstraints in their dimensionality, 
usually requiring that n be a power of two or a power of two plus one. Steinberg and Lin 
(2006) state, “[t]he primary limitation to our method is the severe sample size eonstraint.” 
For instanee, to explore 12 faetors Steinberg and Lin require only 16 runs. To explore 13 
faetors, they require a design with 64 runs. 

This dissertation details an algorithm for generating NOLHs for a greatly 
expanded set of n and k, ineluding situations where k = n- 1, i.e., fully saturated designs, 
as well as methodologies that expand on the usage of LHs to inelude aeeommodating 
diserete variables into nearly orthogonal designs. We further speeify the eonditions 
under whieh these approaehes are appropriate. Our researeh goals are listed below. 

• Provide analysts the ability to apply uneorreeted LH designs to a number 
of situations in whieh seientists do not have the advantage of developing 
OLH or NOLH designs, or may not wish to use OLH or NOLH designs. 

• Provide analysts the ability to generate OLH and NOLH designs for many 
more eombinations of n and k than previous approaehes allowed—with a 
partieular emphasis on maximizing k for any given n. 

• Provide analysts the ability to ineorporate different faetor types (diserete 
and eontinuous) with different value levels (mixed faetor, mixed level) 
into a nearly orthogonal design matrix. 

• Provide analysts the ability to quiekly generate new, robust, speeial 
purpose designs to meet new and ehanging design requirements. 

To meet these goals, we quantify when random Latin hypereubes (RLHs) are aeeeptable 
and develop a number of progressive teehniques to eonstruet and use NOLHs. 

Tools and methods, based on the maximum absolute pairwise eorrelation, help 
seleet an appropriate dimension for the experimental design and obtain an uneorreeted 
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RLH with a suitable Astute seleetion of a design dimension {n = sample runs, 

k = faetors) will often allow an experimenter to quiekly generate a useful design. We 
present a proeedure that systematieally direets the seleetion of an appropriate design 
dimension and provides guidanee for aeeepting an uneorreeted RLH, as RLH designs are 
eommonly used by praetitioners. This ineludes a parsimonious funetion that relates the 
expeeted maximum absolute pairwise eorrelation to transformed values of n and k. This 
method requires no speeialized algorithms or software paekages, making it applieable in 
most situations and an attraetive option for experimenters with limited resourees. 

We expand the applieation of Florian’s method (1992) to produee new OLH and 
NOTH designs. Our studies reveal that iterative applieation of Florian’s method ean be 
extremely powerful. We demonstrate that the number of eandidate designs needed is 

n 

often mueh less than 1,000 for even large experiments. When n > 50 and 

Florian’s method is usually suffieient to transform one randomly generated LH to meet 
our NOTH eriteria. Even if we break these loose eonstraints, we often eonstruet NOLHs. 

We use optimization teehniques to eonstruet OLH and NOLH designs for 
dimensions beyond what Florian’s method allows. Optimization presents diffieulties for 
solving the experimental design problem beeause our objeetive funetion is nonlinear and 
we have integer eonstraints. As n and k inerease, the dimensionality of LHs adds to the 
ehallenges. Therefore, we develop a eonstruetion methodology based on a foeused 
optimization routine that greatly expands the set of n and k for whieh orthogonal and 
nearly orthogonal designs are available. Speeifieally, we eombine RLH generation, 
Florian’s eorrelation reduetion method, and optimization of a mixed integer problem— 
using a heuristie that foeuses on one eolumn at a time—to minimize for a speeified 

n and k. By minimizing the worst-ease eorrelation, we eontrol all other pairwise 
eorrelations. To best illustrate the power of this new methodology, we eoneentrate on 
design dimensions that are absent from existing eatalogues and ones that are the most 
diffieult to generate with previous methods; that is, experiments with n < 50 as k 
approaehes n. 
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Our results show significant improvements over the efficiency of past 
experimental designs for the same number of runs. Table 1 compares our new technique 
with other recent methods to create orthogonal or nearly orthogonal Latin hypercube 
designs. For a given n, our methods produce designs that can explore many more factors. 

Table 1. A comparison of recent construction methods to generate OLH or NOTH 
designs shows that our new method significantly increases the number of factors, 
k, that may be examined for the same number of runs, n. 


n 

Maximum k for Each Method 

Ye 

Cioppa 

Ang 

Steinherg 
and Lin* 

New 

17 

6 

7 

8 

12 

16 

33 

8 

11 

16 

NA 

32 

65 

10 

16 

32 

56 

63** 

* Steinl 

berg and Lin’s method uses n-l runs. 


** New design uses 64 runs. 

In addition to increasing the number of NOLHs, we exploit the flexibility of our 
new designs to develop an approach that handles mixed-factor, mixed-level (MFML) 
experiments. In accordance with LH sampling construct (McKay et al., 1979), OLH and 
NOLH designs assume that all variables are continuous. In practice, this assumption is 
frequently false. Many experiments include a number of discrete variables (that do not 
necessarily have the same number of value levels), along with continuous factors. We 
designate these as MFML experiments. 

MFML experiments can diminish the advantages of OLH and NOLH designs. 
Converting (by rounding) the actual values of these discrete variables onto a raw OLH or 
NOLH design, especially when there are a small number of runs, often results in an 
overall design with high correlations among input variables. To remedy the problems 
that MFML experiments present scientists, we exploit the dimensional flexibility of our 
new designs by combining them with stacking methods. We also combine our method 
with proven design techniques. The resulting MFML designs retain much of the 
orthogonality properties of the basic NOLH, thereby maintaining their utility. 

The principal shortfall of previous methods to develop OLH and NOLH designs is 
their strict limitations in dimensionality. Our new techniques and procedures overcome 
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many of these restrietions, thereby enabling seientists and analysts to apply these 
powerful designs in more situations. 
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I. 


INTRODUCTION 


Latin hypercubes (LHs) have proven to be a powerful tool for conducting 
high-dimensional computational experiments in many of the sciences and national 
security studies. Our research expands the set of orthogonal Latin hypercube (OLH) and 
nearly orthogonal Latin hypercube (NOLH) designs available to those involved in 
experimentation. We accomplish this by greatly increasing the feasible combinations of 
design points ^ (n) and factors^ (k) that are available for experimentation—^which 
heretofore have been highly constrained. Furthermore, we offer methods to create nearly 
orthogonal designs when some of the input variables are discrete—perhaps with a large 
number of possible values. Moreover, our work provides a descriptive study of the 
correlation structures inherent in random and specially constructed LHs. The culmination 
of this dissertation provides the first high-dimensional, fully saturated NOLHs. 

A. CHALLENGES IN EXPERIMENTAL DESIGNS FOR SIMULATIONS 

Experimentation is fundamental to knowledge acquisition. Unfortunately, 
physical experimentation is often infeasible due to safety, money, time, or resource 
constraints. Consequently, experimenters often turn to computational experimentation, 
such as computer simulation, of the system of interest. The ability of computers to 
simulate increasingly complex problems provides analysts greater potential to assist 
decision makers, such as those in the Department of Defense (DoD). Network-centric 
warfare and irregular warfare are but two of the areas that involve vast numbers of 
quantitative and qualitative variables. Often these simulations contain hundreds, or even 
many thousands, of input variables (Saeger & Hinch, 2001). The commercial world and 
natural scientists face similar dilemmas. Studies in human behavior and biomimetics 
often use computer simulations and rely on efficient designs of experiments (DOEs), 


1 A design point is a unique combination of the values of the factors. We use the terms design point 
and run interchangeably. 

2 Factors are input variables whose settings or values can be controlled by the experimenter. We use 
the terms/actor and variable interchangeably. 
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adopting any technique that allows analysts to “systematically examine a broader range 
of possible innovations . . (Booker, 2005). 

Researchers implement a number of methodologies to extract the most useful 
information from simulations. Among these methods are experimental designs 
specifically developed for efficiently exploring computer models (Kleijnen et al, 2005). 
The design specifies the inputs for the experiments. In particular, given that n 
experiments are to be conducted over k input variables, the DOE is usually specified by 
an n X k design matrix X. Each column of X represents an input variable and each row 
specifies the input variable values for a single design points (see Table 2). 

Table 2. An example design matrix for three factors (i.e., k = 3) and 

eight experiments (i.e., n = 8). 


Design 

Point 

Factor 1 

Factor 2 

Factors 

1 

1 

0.5 

0 

2 

1 

0.5 

10 

3 

1 

1.5 

0 

4 

1 

1.5 

10 

5 

0 

0.5 

0 

6 

0 

0.5 

10 

7 

0 

1.5 

0 

8 

0 

1.5 

10 


The information that is obtainable by analyzing the data after conducting the 
experiments depends critically on the design. Eor example, if a quantitative input 
variable only has two distinct values, then the expected response cannot be modeled as a 
quadratic in that variable. If we know in advance what metamodels we want to fit to 
describe the response surface of the simulation model (Eaw & Kelton, 2003), and the 
error structure of the experiments, then an optimal design may exist (Eedorov, 1972). 
However, in many cases, especially when conducting exploratory analysis, we desire 
designs that “allow one to fit a variety of models” (Santner et ah, 2003). Eor such 
situations, EH sampling (McKay et ah, 1979) has proven to be an invaluable technique 
for designing high-dimensional computer experiments. Eor the past 15 years it has 
become an “important part of uncertainty analyses” (Wyss & Jorgensen, 1998). Under 
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general eonditions, LH designs perform exeeptionally well in eomparison to other 
popular experimental design options (Johnson, 2008). In faet, LHs are purported to be 
the predominant DOEs involving eomputer simulations (Buyske & Trout, 2001). 

LHs are popular designs for simulation exploration beeause of their design and 
analytieal flexibility (Lueas & Sanchez, 2006a). An LH design can easily be constructed 
for any number of continuous input variables (k) and sampling budget (n) as long as 
k <n. Indeed, many simulation software packages, even spreadsheet simulation add-ons, 
can generate LHs (Sugiyama & Chow, 1997). With sufficiently large n, the output data 
can be fit to a wide variety of metamodels (from simple to complex) for many different 
responses. In particular, LHs enable us to simultaneously screen many factors for 
significance and fit very complex metamodels to a modest number of critical variables. 
Moreover, LHs have good space-filling properties, i.e., they are good at providing 
“information about all portions of the experimental region” (Santner et al., 2003). 

The existence of desirable properties in LH designs, which correspond with their 
degree of utility, is dependent on the absence of correlation among the column s of the 
design matrix. Many analytical techniques that experimenters apply to computer 
outputs—such as regression modeling and partition trees—suffer when there is 
multicollinearity among the input variables (Montgomery et al., 2001 and Kim & Loh, 
2003). Consequently, analysts usually desire a design matrix with a correlation structure 
that is close to that of the identity matrix (Iman & Conover, 1982, and Iman & 
Davenport, 1982). Unfortunately, generating LH designs involves an inherent 
randomness in the construction of the column s —a random permutation of the value 
levels, 1 through n. This construction method invites the possibility of substantial 
multicollinearity among the columns of the design matrix, especially when n is not 
substantially larger than k. 

To mitigate the difficulties that can be caused by correlations among the column s 
in the design matrix, a succession of methods have been developed that reduce or even 
eliminate the correlations. An LH is called an orthogonal Latin hypercube (OLH) if the 
correlation coefficient between all pairs of column s in the design matrix is zero. We will 
denote an OLH with k variables in n runs as . In some situations, an OLH may not be 
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available or it may come with poor space-filling properties (Cioppa & Lucas, 2007). In 
such a situation, a nearly orthogonal Latin hypercube (NOLH) may exist and be 
preferred. We will define NOLHs precisely later in this section. An NOLH with k 
variables in n runs is denoted as . The absence or near absence of correlations among 
their design columns makes OLH and NOLH designs attractive. However, there are only 
a small number of these catalogued designs available and most applications do not use an 
OLH or NOLH design. 

A variety of reasons exist for why OLH and NOLH designs are not used more 
often than they currently are. Efforts to reduce or eliminate correlation are often 
computationally expensive and time consuming (Cioppa, 2002). The complexity of 
algorithms involved in creating them typically requires specialized software and 
programming (Iman & Conover, 1982). Most detrimental for increasing the use of OLH 
and NOLH designs are the severe restrictions on the dimensionality of the design matrix 
(and hence, experiment). In every case, the dimensions of new designs restrict n to be a 
fimction of a power of 2, or a power of 2 plus 1. Furthermore, methods for creating LH 
designs assume continuous variables, which necessitate rounding off the raw design 
values to accommodate discrete variables. The resulting design values do not typically 
retain the orthogonality (or near orthogonality) properties of the raw design. 

This dissertation overcomes the above obstacles and greatly expands the set of 
readily available OLH and NOLH designs. It presents new construction methods, and 
augments previous techniques, to generate design matrices (A) with little or no 
correlation. While LH designs assume continuous variables, our work also encompasses 
discrete variables that may have unequal numbers of levels. The resulting design 
matrices fill substantial gaps in the current library of OLH and NOLH designs. 

B. RANDOM LATIN HYPERCUBES AND INHERENT CORRELATION 

The foundation of our work resides in the characteristics of the random Latin 
hypercube (RLH). We briefiy describe the nature of the RLH as a reference for defining 
the terminology we use in this dissertation and setting the context for our enhancements. 
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1. 


Random Latin Hypercube Generation 


RLH generation is so named to emphasize the randomness in its eonstruetion. 
Generating an RLH is relatively simple. Consider an n by A: matrix, X. Owen (1994) 
deseribes the element in the /* eolumn, Xj , of an LH as 


X! = F: 




, for/ = l,...,n andy = 1,...A:, where ;r .(l),...,;r .(n) is one of the 


V J 

n\ possible random permutations of 1 ,..., n in whieh all n\ permutations are equally 
likely. The Fj,j= 1, ...k are continuous and invertible distribution functions. Finally, 


the U.j, i = = 1, ...k are independent and identically distributed uniform [0, 1] 

random variables. 

Similar to Owen (1994), Ye (1998), Cioppa (2002), and Steinberg and Lin (2006), 
we use a lattice version of the LH (Patterson, 1954) and assume that F. is the distribution 

function of a uniform [0, 1] for / = n, j = \,...k. We further replace the uniform 
random variable, U , with 0.5, the median of a uniform [0, 1] distribution; thus creating a 


center point in each stratum. Therefore, generating an RLH reduces to independently 
generating k columns, each a permutation of the integers i = 1 ,..., n, where the n\ 
permutations are equally likely. For a given column, j = 1, 2,..., k, this yields; 


Y/ = 




for i = 1, 


n . The columns of the resulting design matrix, X ^, for 


j = k, are A: independent permutations of the vector x, which contains elements 


derived from the following simplified formula (Owen, 1994): x. = 


(i-.i) 


, for i = 1,..., n. 


We illustrate this method by examining three factors {k = 3) with four sample runs 
(n = 4). The experimenter calculates the first element, / = 1, from the simplified formula 

-—^ = —— = 112l = L _ Computation of the remaining elements, for i = 2, 3, and 4, 
n 4 4 8 


results in the vector x = 


^357 

8 ’ 8 ’ 8’8 


. A random permutation of x for each column 


produces a lattice LH for three factors and four design points, as shown in Figure 1. 
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8 8 8 

111 
v8 8 8. 


Figure 1. Three permutations of the veetor x results in an RLFI design for three 
variables and four sample design points using Owens’ simplified formula to 
ereate a lattiee LH. 

The eolumns in X are often translated and sealed for a partieular applieation. For 
a given eolumn, the values are equally spaeed between the minimum and maximum 
values allowable for the eorresponding variable. Given this lattiee eonstruetion, a total of 
(n!) permutations are possible for eaeh column. This does not create (n/)^ unique LFls 
because some designs can be mapped to others by permuting rows and columns. 
However, the total number of unique designs is astronomically large. For example, there 
are over 87 billion unique designs with k=2 and n=14. 

The simplicity and flexibility of this sampling scheme has a price. It is precisely 
the randomness of column generation that opens the possibility for strong 
multicollinearity among the columns of the design matrix. 


Measure of Nonorthogonality for Design Selection 


Along with Tang (1998), we recognize that selecting a design based on the 
correlations between the columns of the design matrix is a reasonable way to obtain a 
design with acceptable nonorthogonality. Cioppa (2002) specifically uses the maximum 
absolute pairwise correlation between columns of an LH to obtain OLH and 
NOTH designs. 

Following Cioppa, we designate the maximum absolute pairwise correlation 


iPmap) the key measure for discriminating between designs. There are 


V^y 


pairwise 


correlations in a design with k variables. Computation of the correlation coefficient 
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between any two veetor eolumns in design matrix X is straightforward. Using Pearson’s 
equation, the eorrelation between X‘ and X-' is given by 

^[(x;-F)(x/-F)] 

V b=l b=l 

where x‘ and x ^ are the mean values of the and 7 * eolumns and xl is the value of 

the eolumn. 

The largest absolute eorrelation among the eolumns gives the most extreme 
pairwise eorrelation. We use this value to define the degree of nonorthogonality of a 
design. By minimizing the worst-ease pairwise eorrelation, we eontrol all other pairwise 
eorrelations. Henee, we foeus attention on throughout our study and ereation of 

new OLH and NOTH designs. Speeifieally, 

I A, I )■ 

Our eolleetion of teehniques minimizes We further use p^^ as a guide for 

eombining reliable design approaehes with our new designs, thereby expanding the utility 
of OLH and NOLH. 

The role of p^^^ in our study is evident in the large values in whieh it appears in 

RLHs, espeeially when k is not too mueh smaller than n. Consider the design matrix in 
Figure 2. Three new permutations of veetor x result in this design matrix, where eolumns 
2 and 3 have a eorrelation value of-1, i.e., p^^^ - 1. 

"3 ]_ U 

8 8 8 
5 3 5 

8 8 8 * 

7 5 3 

8 8 8 

i Z 1 

v8 8 8y 

Figure 2. In this RLH, eolumns two and three give [pja | = P^ap - ^ ■ 
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In a sampling method in which all possible RLH designs are equally probable 
(Satterthwaite, 1959), the probability that a highly correlated design occurs is high. To 
illustrate, we randomly generate 1,000 4 x 3 RLH designs and measure the of each. 

Over 77% of the designs have a > 0.80, with nearly 25% having p^^^ = 1. Clearly, 

the possibility of incurring high correlations in an RLH can be problematic. 

Whereas randomness creates a concern of possible high correlations among the 
column s in the design matrix, it may also result in RLH designs that have very small 
correlations. This observation offers an opportunity to develop a systematic process for 
selecting suitable RLH designs and applying a variety of transformation and generation 
techniques to minimize their nonorthogonality. In doing so, possibilities to reduce 
restrictions in dimensionality also emerge. 

C. A PROGRESSION OF OLH AND NOLH CONSTRUCTION METHODS 

Techniques to create new OLH and NOLH designs have produced incremental 
increases in feasible n and k combinations. A succession of studies extends the 
foundational paper by McKay et al. (1979) to produce methods that can reduce the 
correlations of a given LH design. Iman and Conover (1982) induce rank correlation in 
the design to correspond with the correlation that one would expect from the input 
variables, resulting in a means to control the correlation of the design matrix. Similar to 
Iman and Conover (1982), Florian (1992) updates LH sampling designs through matrix 
transformation. Owen (1994) introduces a process based on Gram-Schmidt 
orthogonalization to control correlations among variables. These methods attempt to 
convert LH designs to correspond with correlation matrices that approach the identity 
matrix. They do not guarantee the complete elimination of correlation from the design, 
although uncorrelated designs are achieved in some cases. 

A relatively recent wave of techniques completely eliminates correlation among 
the columns of an LH design during its construction. Ye (1998) introduces a class of 
LHs with no correlation among its columns and designated them OLHs. Ye finds that it 
is possible to develop OLH designs when the number of runs, n, is a power of 2 plus 1. 
Through Ye’s method, experimenters can generate an orthogonal design {p^p =0 and 
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condition number =1)^ with k = 2m - 2 for an experiment with n = 2"* + 1 runs, where m 
is a positive integer. Each column in the design uses a permutation matrix that is based 
on the Kronecker product (Graham, 1981) of identity matrices and a transposition matrix. 
Ye uses all m of his main permutation matrices and m - 2 permutation matrices from the 

possible pairwise multiplieations of the main permutation matrices. Ye (1998) 

understands that the sizes of his OLH designs are somewhat inflexible, but asserts that 
eomputer power mitigates this disadvantage. 

Cioppa (2002) extends Ye’s method for constructing OLH designs by increasing 
the number of permutation matrices that generate mutually orthogonal columns. Instead 


of using m - 2 of the pairwise combinations, Cioppa uses all 


m-1 
V 2 y 


pairwise matrix 


multiplication combinations. In so doing, Cioppa increases the number of columns (k) in 


the design matrix from 2m - 2 to m + 


m-1 
V 2 y 


while maintaining their full orthogonality 


for the same number of runs. 

Cioppa, however, notices that this approach can yield an OLH that does not have 
good space-fdling properties. Hence, Cioppa develops a computationally intensive 
algorithm that improves on the space-filling properties of Ye’s designs by accepting a 
small amount of nonorthogonality. Cioppa selectively catalogues designs with 
space-filling qualities that are better than Ye’s designs, in terms of the modified L 2 
discrepancy {ML 2 ) and Euclidean maximin (Mm) measures. The amount of 
nonorthogonality that Cioppa accepts in terms of is at most 0.03 and X must also 


have a condition number less than 1.13 (when the columns of X are scaled to span the 
interval [-1,1]). We recall that an OLH has equal to 0 and condition number equal 


to 1. The slight nonorthogonality of the design earns its name as an NOLH. Cioppa 
demonstrates that his method works for up to k = 67 and conjectures that for any positive 


3 The condition number is the ratio of the largest to smallest eigenvalues of X’X, where X is the 
n X k design matrix (Cioppa & Lucas, 2007). The condition number represents the design matrix’s 
sensitivity to small changes in its values (Leon, 2002). 
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integer value of m, an NOLH design ean be eonstrueted whenever n = 2“ + 1 and 


m + 


m-1 
V 2 y 


Existing space-filling NOLH designs from Cioppa’s method are 


catalogued for up to 29 factors within 257 runs. 

Ang (2006) augments Cioppa’s methodologies to create larger OLH and NOLH 


dimensions. While Cioppa uses all 


m-1 


pairwise matrix multiplication combinations, 


Ang uses all three-way, and larger, product combinations to create new permutation 
matrices. Each new permutation matrix generates a new column in the OLH design. The 


m-1 

number of factors that Ang’s designs can explore is k = 1+/ ^ 

j=i \ j J 


where p (with the 


maximum p = m-\) is the p-way combinations used for developing the permutation 
matrices and m is a positive integer. The number of runs required to explore k factors is 
still n = 2’^ + 1. Lor instance, a design with m = 5 can explore k= \6 factors when using 
the maximum (p = 4) combinations. Such a design requires n = 2^ + 1 = 33 runs. This 
technique results in designs that can explore more factors with the same number of runs. 
These designs begin to approach a certain saturation level, with a noted decrease in their 
space-filling properties as p increases. Additionally, Ang (2006) redefines the NOLH 
criteria as <0.05 and a condition number of no more than 1.20. We use these 


thresholds for our new methodologies, with a particular emphasis on minimizing . 

Steinberg and Lin (2006) rotate two-level factorial designs to construct OLHs for 
n = 2^, with h a power of 2, and the maximum number of factors being Bh><h, where 

, with [cj = the maximum integer less than or equal to c. Lor instance, for 


Bu 


n — \ 


n = \6 runs, h = A and 5/, = 3, so a Oj® is possible. We note that these OLHs do not 

include a center point, as do previously discussed OLHs. 

The success that these recent efforts have had in creating OLH and NOLH 
designs is extensive, but OLH and NOLH designs are still subject to stringent constraints 
in their dimensionality. As Steinberg and Lin (2006) state, “[t]he primary limitation to 
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our method is the severe sample size constraint.” We see that to explore 12 factors with 
their algorithm requires only 16 runs. To explore 13 factors, Steinberg and Lin require a 
design with 64 runs. In comparison, Ang (2006) needs 33 runs to explore up to 16 
factors. However, to explore 17 factors, Ang must increase the number of runs to 65. 
Table 3 compares our new method with recent methods to create orthogonal or nearly 
orthogonal designs. 


Table 3. A comparison of recent construction methods to generate OLH or NOTH 
designs shows that our new method significantly increases the number of factors, 
k, that may be examined for the same number of runs, n. 


n 

Maximum k for Each Method 

Ye 

Cioppa 

Ang 

Steinberg 
and Lin* 

New 

17 

6 

7 

8 

12 

16 

33 

8 

11 

16 

None 

32 

65 

10 

16 

32 

56 

63** 


* Steinberg and Lin’s method uses n - 1 runs. 
** New design uses 64 runs. 


All existing OLH techniques have a common inflexibility in dimensionality. Our 
new approach constructs flexible NOLH designs—that is, sample sizes need not be 
related to a power of 2 (any n is allowable) and many feasible k (often all A: = 1, ...n- 1) 
are possible. Indeed, in many cases, we create fully saturated designs that are orthogonal 
or nearly orthogonal. Furthermore, the flexibility of our OLH and NOLH designs make it 
possible to combine them with other methods to create experimental designs that include 
discrete variables. 

D. DISSERTATION ORGANIZATION 

The flow of this dissertation is a progression of techniques and procedures that we 
build to meet our research goals. The resulting tools and products greatly expand our 
ability to explore broad regions of a complex simulation model containing a 
high-dimensional input space, characterized by a response surface that may be highly 
nonlinear. We introduce several methodologies that enable analysts to use OLHs and 
NOLHs in more situations. Furthermore, we specify the conditions under which the 
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various approaches are appropriate. We also develop algorithms for aeeommodating 
diserete variables into nearly orthogonal designs. Our researeh goals are listed below. 

• Provide analysts the ability to apply uneorreeted LH designs to a number 
of situations in whieh scientists do not have the advantage of developing 
OLH or NOLH designs, or may not wish to use OLH or NOLH designs. 

• Provide analysts the ability to generate OLH and NOLH designs for many 
more eombinations of n and k than previous approaehes allowed—with a 
partieular emphasis on maximizing k for any given n. 

• Provide analysts the ability to ineorporate different faetor types (diserete 
and eontinuous) with different value levels (mixed faetor, mixed level) 
into a nearly orthogonal design matrix. 

• Provide analysts the ability to quiekly generate new, robust, speeial 
purpose designs to meet new and ehanging design requirements. 

The following four ehapters eaeh deseribe teehniques to reach our study goals. 
Chapter II diseusses RLHs more thoroughly, to provide the reader with a better 
understanding of their utility and to build a foundation for other teehniques. It eontains 
the mathematieal theory underlying our designs and the details neeessary to construet 
them. We diseuss as the nonorthogonality measure that is our foeal point for all 

design teehniques. We ereate tools based on to seleet dimensions for an RLH 

design that has a very good ehanee of possessing an aeeeptable nonorthogonality 
measure. Chapter III quantifies the effeetiveness of Florian’s (1992) eorrelation 
reduction method, as well as its limitations, for produeing OLH and NOLH designs. 
Florian’s method is an earlier teehnique that has great utility in eonstrueting a number of 
nearly orthogonal designs that do not eonform to previous restrietions in their 
dimensions. Chapter IV eontains a eombined applieation of RLH generation, Florian’s 
(1992) method, and mixed integer programming (MIP) to produee new OLH and NOLH 
designs that greatly expand on what was previously available. The diffieulty in using 
optimization to solve eombinatorial problems, sueh as DOFs, prevents seientists from 
pursuing it as a praetieal solution method. We formulate the MIP problem to make 
optimization a viable option. Applieation of these eombined teehniques results in a new 
family of designs that we eall saturated nearly orthogonal Latin hypereubes (S-NOLH). 
We diseuss in detail the advantages that this family of designs offers to seientists. 
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Chapter V considers mixed-factor, mixed-level experiments that include (perhaps many) 
discrete variables that do not necessarily have the same number of possible values. The 
mixture of these variables utilizes a systematic process for leveraging OLH and NOTH 
designs constructed using other techniques. We apply crossed designs, column 
permutation, stacking methods, and logical lines of processes to produce designs that 
retain much of the orthogonal properties of the base design. The base design is the set of 
continuous variables from which we create an OLH or NOTH design, as articulated from 
techniques in earlier chapters. 

Chapter VI demonstrates a case study for our designs. We survey a number of 
studies, over a variety of applications, which use our new designs. Finally, Chapter VII 
summarizes our conclusions in this research area and provides directions for 
future research. 
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II. SHAPING CONDITIONS FOR APPLYING UNCORRECTED 
RANDOM LATIN HYPERCUBES 


Latin hypercubes (LHs) are one of the most commonly used designs for computer 
experiments (Kleijnen et ah, 2005). However, the regularity in which high correlation 
occurs in some of these designs complicates analysis of the results. Efforts to reduce or 
eliminate correlation are often computationally and monetarily expensive, require special 
purpose software, and can be time consuming. Many scientists frequently use 
uncorrected^ random LH (RLH) designs in their experiments and attempt to work with 
the inefficiencies that may result from them. 

We develop tools and techniques, based on the maximum absolute pairwise 
correlation, to select an appropriate dimension for the experimental design and obtain an 
uncorrected RLH with acceptable correlations. This method requires no specialized 
algorithms or software packages, making it applicable in most situations. The speed with 
which our method can be employed makes it an attractive option for experimenters with 
limited resources. 

We propose an approach to find designs with acceptable correlations among their 
colu mns : dimension selection and RLH generation. Current methods to minimize 
correlation in LH designs generally fall into two broad categories: (1) those that 
transform an original design to have an acceptable degree of correlation, or (2) those that 
eliminate correlation during construction of the design column s . Our technique falls into 
neither category. We accept the natural proclivity of RLH production to frequently have 
high correlation among the columns and use the randomness in column generation to 
our advantage. 

We assert that astute selection of a design dimension (n = sample runs, 
k = factors) and a predictive model for an RLH’s expected maximum absolute pairwise 
correlation will often allow an experimenter to quickly generate an acceptable design. 
Our method takes advantage of the ease with which an RLH design is created. Coupling 
rapid RLH generation, as we described in Chapter I, with the degree to which we can 

^ An uncorrected LH is one to which no correlation reduction method is applied. 
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predict the maximum absolute pairwise correlation, our procedure systematically directs 
the selection of an appropriate design dimension and provides guidance for accepting an 
uncorrected RLH. Because this method requires no manipulation of the RLH, any 
investigator can employ it with common software and computing tools. 

A. TOOLS FOR DIMENSION SELECTION AND RANDOM LATIN 
HYPERCUBE GENERATION 

We earlier established as our nonorthogonality measure for an LH design. 

We use this measure to select an RLH design with an acceptable degree of 
nonorthogonality. First, we use as a guide to select an appropriate design dimension 

for an experiment. Second, with the design dimension fixed, the nonorthogonality 
measure serves as a threshold for selecting an acceptable design from iterative 
generations of RLH. This section describes the development of correlation-based tools 
and the procedures for their application. 

We use these -based tools when more complex algorithms associated with 

correlation reduction methods are not readily available to the experimenter. The ease of 
generating an LH with comparatively unsophisticated means makes them an attractive 
option (Sugiyama & Chow, 1997) for creating experimental designs. If a random 
generation of an LH has acceptable correlation among its columns, an experimenter can 
reap the benefits from an efficient design with little computational effort. Cioppa (2002) 
defines efficient experimental designs as those which (i) detect as many significant 
variables, nonlinear effects, interactions, and their associated ranges as possible, 

(ii) declare significant as few non-significant variables and interactions as possible, and 

(iii) accomplishes (i) and (ii) with a minimal number of runs. This chapter introduces the 
mechanism to create experimental designs that possess acceptable levels of 
nonorthogonality from uncorrected RLHs. 

I. Creating the /?”” Table 

We first create a series of correlation tables that combine design dimensions, 
which correspond to known OLH and NOTH designs (Ye, 1998 and Cioppa, 2002), and 
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the number of RLHs to generate from whieh to seleet a design. Eaeh entry in the 
eorrelation table is an average of the minimum for a given n and k. 

We begin our work with an initial set of data that eonsists of 42 n by A: design 
eombinations. Using Cioppa’s (2002) convention for design dimensions, we explore 

f m — \\ 

combinations of n = 2'”+l for m = 1,...,8 and k = m+ for m = 1,...,16. 

\ ^ J 

Therefore, we examine design dimensions as small as {n = \1, k = 1} and as large as 
{n = 257, k = 121}. We consider only those designs with n > k. Designs where the 
number of runs is less than the number of factors produce unstable models (Leon, 2002). 
We enter NA (Not Applicable) for n < k combinations in the tables. 

The correlation tables also account for the number of RLHs to generate, and from 
which to select an acceptable design. We let G be the number of designs from which to 
select our experimental plan. Tang (1998) uses a similar procedure, comparing several 
designs before selecting one to apply correlation reduction methods. Other scientists 
consider many more designs before choosing one for an experimental design. To create 
the correlation table we generate 200 (annotated as G200) uncorrected RLH designs for a 
specific n, k combination and compute for each. Lrom this set of generated designs 

we record the smallest p^^^, which we designate as p™. We repeat this process 1,000 

times for each n by k combination and compute the average of the 1,000 p™” values, 

PmZ ■ This value is the entry in the table for the specific design dimension and estimates 

the corresponding expected p™ among 200 uncorrected RLH designs. Examination of 

these estimates show low variability, as one expects from extreme statistics (David & 
Nagaraja, 2003), and acceptable precision for our work. 

The different design dimensions vary greatly in the values of their /?”” , but the 
largest sample standard deviation for any specific design dimension is 0.025 (Ligure 3). 
Therefore, the standard deviation of /?”“ for any of these design dimensions will also be 

very small, considering that the average was a computation from 1,000 independent 
observations. Ligure 3 shows that the largest standard deviation occurs for a small design 
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n = 11, k = 16, corresponding to /?”” =0.306. The smallest standard deviation is for a 


large RLH, n = 257, k = 106 corresponding to p™ = 0.204. The majority of these have a 
standard deviation that is less than 0.01, so predictions will be meaningful. 


standard Deviation of Sample vs. Mean of Sample for Different Design 
Dimensions 
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Figure 3. The sample standard deviation of for 1,000 G200 n and k combinations 

is relatively small compared to its /?”” value. The largest standard deviation 
occurs in small designs, while the smallest deviations are for larger designs. 

Table 4 is the correlation table for different design combinations for G200. Its 

organization shows that as the number of runs for a fixed k increases, /?“” decreases, 

which is consistent with Owen (1994). Conversely, p”” inereases as k inereases for a 

fixed n. The table provides the design combinations which are likely to produce a design 
with the required by generating 200 RLHs. If the table indicates that the initial 

design dimension cannot attain the desired correlation within 200 RLHs, then the table 
guides the experimenter to increase n, decrease k, or both, to have a reasonable ehance of 
obtaining an RLH with an acceptable . This tool allows the experimenter to 

ascertain a realistie expectation of p^^^ for a given design dimension. 
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Table 4. This portion of the G200 table shows the /?”“ for different design 

combinations where the number of designs generated in each trial is 200. We 
enter not applicable (NA) for design combinations where n<k. 


G200 

K7 

Kll 

K16 

K22 

K29 

K37 

K46 

K56 

K67 

K79 

K92 

K106 

K121 

N17 

0.3066 

0.4188 

0.4932 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

N33 

0.2108 

0.2931 

0.3503 

0.3923 

0.4267 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

N65 

0.1485 

0.2073 

0.2481 

0.2793 

0.3038 

0.3244 

0.3416 

0.3567 

NA 

NA 

NA 

NA 

NA 

N129 

0.1044 

0.1460 

0.1756 

0.1982 

0.2161 

0.2310 

0.2432 

0.2544 

0.2638 

0.2721 

0.2799 

0.2869 

0.2931 

N257 

0.0738 

0.1032 

0.1240 

0.1403 

0.1530 

0.1638 

0.1728 

0.1805 

0.1871 

0.1935 

0.1988 

0.2038 

0.2082 


Table usage is straightforward. Suppose an analyst wishes to explore 20 factors, 
while maintaining < 0.20, but is uncertain to the number of runs to perform. The 

shaded block of cells in Table 4 indicates that it is possible for the experimenter to obtain 
an RLH with an acceptable within 200 RLHs if the design consists of somewhere 

between 65 and 129 runs. The table also guides the analyst to consider fewer factors. 
Additionally, we note that randomly generating an NOTH is unlikely. 

One could also increase G. However, empirical study shows that increasing G, 
even to 1,000, gives only marginal improvement from G200. Table 5 gives the standard 
deviations of GIOOO RLHs for design combinations of n = 17 and 257 with k = l and 106 
and compares them to those for G200 RLHs. The greatest difference in standard 
deviations in GIOOO and G200 designs is for n = \1, k = 7, where GIOOO is less than 

G200 by 0.0039. For the magnitudes of p™ shown in Table 4, 0.0039 is small. 


Table 5. We show a few different design combinations to compare the standard 
deviations of 1,000 GIOOO and G200 RLHs. The greatest difference in standard 
deviations between GIOOO and G200 is for the design, n = 17, k = 7, where 
GlOOO’s standard deviation is smaller by 0.0039. 


GIOOO 


k7 

kl06 

nl7 

0.02i5 

NA 

n257 

0.0053 

0.0028 

G200 


k7 

kl06 

nl7 

0.0254 

NA 

n257 

0.0066 

0.0035 
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Likewise, the magnitude of for GIOOO and G200 in Table 6 shows that the 

differences are very close to the standard deviations. We conclude that G200 is a 
reasonable number of RLHs from which to choose the design with the best observed 

since the improvements from a sample size that is five times larger are small. 

Table 6. A comparison of /?““ for 1000 GIOOO and G200 shows that the 
differences are within the magnitude of the standard deviations shown in Table 5. 


GIOOO 


k7 

kl06 

nl7 

0.2737 

NA 

n257 

0.0652 

0.1994 

G200 


k7 

kl06 

nl7 

0.3066 

NA 

n257 

0.0738 

0.2038 


The general guidance described in this section frames the design dimensions that 
are necessary to fulfill the analyst’s need. However, this approach only provides a rough 
estimate of the design dimensions. We develop another tool to specify n and k. 

2. A Function to Predict /?“” 

Our goal is to develop an equation that uses the values of n and k to predict /?”” 
from 200 RLH designs. The objective is an equation that is sufficiently simple to use 
with a calculator. Using data from the /?”“ table with the formula allows the 

experimenter to enter different (n, k) pairs until a specific design dimension meets an 
acceptable nonorthogonality value. 

a. Initial Analysis of p™” 

We use the /?”” data from which we developed the tables in the previous 
section to construct a parsimonious predictive equation. Patterns are evident in the 
relationships between p™ and {n, k) when either n or k is constant and the other 
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changes. Grouping /?”“ values based on n uncovers clear patterns in the data. Figure 4 


shows that as k increases and n is constant, the relationship between p™” and k appears 

logarithmic. The overlaps in the groupings of n as increases also indicate an interaction 
between n and k. 



Figure 4. A logarithmic pattern appears between /?”” and k when n is constant; 
Groups of n are presented in increasing order. 

Similarly, Figure 5 displays an emerging pattern when k is constant and n 

increases. The relationship between and n displays an exponential decay. These 


tendencies for p™ regarding n and k, respectively, provide a clue to their transformation 
in order to draw a linear relationship with each. 
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Figure 5. An exponential decay or a negative power function relationship appears 
between /?”” and n when k is constant. For clarity, only a few sets for fixed 
k are in the chart. 

Transforming n and k results in each having a nearly linear relationship 
with /?“” . Owen (1994) shows that the variance of the root mean square correlation 
(Prms) uncorrected LH design is related to n ^ We examine different 

transformations of n to determine its linear relationship with p”” . A transformation of 
shows a nearly linear relationship when k is constant. Figure 6 shows an example of 
a nearly linear relationship between and when k = 7. 
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Figure 6. There is a nearly linear relationship between p™” and transformed n. For 
illustration we show the graph for = 7. 

Similarly, a transformation of k to reveals a nearly linear relationship 
with , when keeping n constant. Figure 7 shows an example of a nearly linear 


relationship between and k for a set of data where n = 257. 



Figure 7. A nearly linear relationship appears between and transformed k. We 
illustrate the case for n = 257. 
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Owen (1994) studies the relationship between the root mean square of an 
uneorreeted Latin hypercube sample (LHS) and n, as n increases. Owen shows 
empirically that the relationship between the of an uncorrected LHS and n to be 

log(P™„) = -0.51og(n). 

Owen’s results support the exponentially decaying relationship that we 
find between p™ and n. We note that this equation is independent of k. To complete 

our study, we explore the simultaneous relationships of n and k with /?”” . 

b. Exploring Simple Linear Regression Models for /?”” 

Examination of the relationship between /?““ and the two transformed 
variables, and A: , reveals that it is possible to fit a multiple linear regression 
model to the data. A study of simple linear relationships between /?”“ and each of the 
transformed values shows that they are nearly linear. Because of an earlier indication 
that an interaction exists between n and k, we also explore the relationship between p™” 

and . Fitting simple linear regression models of p™” versus , A: and 

^-2/3a.-i/ 3 model having an of 0.98 or greater. 

c. A Multiple Linear Regression Model for /?“” 

The above results indicate that a multiple linear regression model with two 
main effect terms and one interaction term can be a good predictor for p™” . To fit a 

multiple linear regression model we use /?”” data from 115 different design dimensions. 
The least squares fit gives an = 0.99 . Our estimated predictor, which we denote by 
(pZI) , is given by 

Ip™ = 0.0873 + l.S59n-^'^ - 0. lOOA: ''' -11.702n . 
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d. Adequacy of the Multiple Linear Regression Model for /?”” 

The multiple linear regression model fit adequately estimates expected 
p™”. The statistics from the predicted values’ residuals are given in Table 7. Of course, 

the residual mean is zero. A standard deviation of 0.00966 and a range of 0.05 are 
relatively small for the values that the model is predicting. The median and the skewness 

value indicate that the model tends to slightly underestimate the actual values. 


Table 7. Descriptive statistics from the residuals derived from the MLR predictions 
show negative skewness and a relatively small range. The model tends to slightly 
underestimate the value of the minimum maximum absolute pairwise correlation 

value of 200 RLH designs. 


Statistics for Residuals 

Median 

0.0029624 

Standard Deviation 

0.0096594 

Kurtosis 

0.8707658 

Skewness 

-1.0013506 

Range 

0.0501953 

Minimum 

-0.0364850 

Maximum 

0.0137103 

Count 

115.0000000 


It is evident from Figure 8 that there are some curvilinear tendencies in the 
residual plot and the residuals do not follow a normal distribution, but the residuals are 
not overly large with respect to the predicted values. Another model that includes 
quadratic terms improves the predictive capabilities of the model, but requires a more 
complex expression (see Appendix E). Considering the parsimony that we desire and the 
relatively small errors, we accept this model. 
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Predicted p 

r map 


Figure 8. This residual plot of 115 different predicted p™” values shows curvature. 

e. Application of the (p”) Model 

We now discuss the use of the equation to estimate expected 

p™”, in conjunction with the p™ table. The p”” table provides general guidance for 
the range of n and k in which a specific correlation value may reside. Because of its 

simplicity, the | p™” j equation can easily be incorporated into a spreadsheet to compute 

estimated values of p™”, as the experimenter varies n and k. Calculations stop when the 

experimenter finds an n and k combination that results in an acceptable correlation value. 
These are the design dimensions that afford the analyst the best chance of finding an 
uncorrected RLH with the desired p,^^^. Because the equation is sufficiently simple, one 

can also solve for any of the three values, when the other two are fixed. 

To produce the design for designated n and k we generate an RLH and compute 
Pnmp- The experimenter may use any number of software programs to produce RLH 
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designs and to measure . Within G generations of RLH designs, we assert that the 
analyst will find a design with an acceptable . 

B. DESIGN DEVELOPMENT WITH CORRELATION-BASED TOOLS 

We employ the correlation-based tools that we have developed in Section A. A 
step-by-step procedure guides experimenters in implementing these tools. A constructive 
example illustrates the leverage that experimenters can gain from RLH designs. 

I. The Process for Dimension Selection and RLH Design Generation 

Our procedure for obtaining an experimental RLH design with an acceptable 
nonorthogonality measure has the following steps. 

Step I: Enter the p™” table to obtain general guidance about design dimensions. Entry 
into the table may occur in one of four ways: 

• The experimenter may have a correlation threshold for a specific number 
of factors. 

• The experimenter may have a limit for the number of sample runs and a 
correlation threshold. 

• The experimenter may have a correlation threshold and an idea for the 
design dimensions. 

• The experimenter may have a correlation threshold with little or no idea 
about the design dimensions. 

Step 2: Erom the /?”” table, select the range of n and k based on the correlation 

threshold. If required, rough interpolation within the table values is sufficient to define 
the ranges. 

Step 3: Establish a spreadsheet to compute | /?”“ j for each design combination. 

Step 4: Calculate values. Best practices reveal that fixing k first, while 

changing n, is a good approach. 
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Step 5: Stop searching combinations when a value of j that approximately meets 
the correlation threshold is calculated. 

Step 6: Record the design combination (n, k) that meets the correlation threshold. 

Step 7: Fix the design dimension and generate up to G number of RLH designs. 
Measure for each new design and stop when the actual is satisfactorily close to 

the correlation threshold or all G designs have been generated. Note that this does not 
guarantee that the threshold is met. 

Following these steps, an experimenter can obtain an acceptable design without a 
large investment in computational costs and time. We designate the resulting RLH 
design as 

2. A Hypothetical Case Requiring an Uncatalogued Design Dimension 

This example clarifies the steps for determining the design dimensions and 
generating an acceptable RLH. An experimenter seeks an experimental design to assess 
k = 20 factors with, at most, n = 100 runs. Since a readily available or design 
does not exist, the researcher could try our methodology. The experimenter determines 
that an experimental design with p^^ < 0.22 is acceptable. To get an initial idea if a 

design that the scientist needs is possible, we enter Table 8 with correlation value, 0.22, 
k - 20, and n= 100. Because there are 20 factors to assess, the experimenter looks down 
the columns of K16 and K22, which bracket k = 20. The shaded part of Table 8 shows 
that the RLH design that can meet the correlation threshold will have between 65 and 129 

runs, because the /?”” values at these two sample run values contain the desired p^^^. 

Since the number of factors is fixed, k = 20, the experimenter is concerned with the range 
65 < n < 100 . 


^ We pattern this naming eonvention similar to Cioppa and Ye: Cioppa (2002) designated NOTH 
designs as and Ye (1998) designated OLH designs as . The symbol emphasizes its 
random nature. 
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Table 8. 


This table is a seetion of a G200 table. It has values for dimensions 

with A: = 22 or less. 


G200 

K7 

Kll 

K16 

K22 

N17 

0.3066 

0.4188 

0.4932 

NA 

N33 

0.2108 

0.2931 

0.3503 

0.3923 

N65 

0.1485 

0.2073 

0.2481 

0.2793 

N129 

0.1044 

0.1460 

0.1756 

0.1982 

N257 

0.0738 

0.1032 

0.1240 

0.1403 


After approximating the design dimensions, the experimenter uses the equation 

=0.0873 + 7.859n ^''-0.109fc-*''-11.702nto compute values from 

different design combinations until one is satisfactorily close to the correlation threshold. 
As we noted earlier, if the experimenter fixes two of the values in the equation, the third 
may be found. We recognize that this equation provides an estimate of the expected 
p™”. Since the standard deviation is small, the actual p™ will usually be close to this 

average value. However, about half of the time will be above it. If the requirement 
is strict, we can repeat the process several times with a high probability (approximately 
1 -( 0 . 5 )*^™^^) of obtaining the threshold. Another approach is to use a conservative value 
of the target p™. 

The experimenter holds k = 20 constant and incrementally increases n. Table 8 
shows that at n = 65, the correlation is between 0.2481 and 0.2793. The spreadsheet 
results in Table 9 show that an R-H design can have a that is approximately 0.22. In 

this case, it is possible to find a design that does not require all 100 sample runs. 
Depending on the cost of each run, the experimenter can extend the budget. 
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Table 9. Calculations for using the MLR model as n changes. 


n 

k 

(P,n.p) 

81 

20 

0.236 

83 

20 

0.233 

85 

20 

0.230 

87 

20 

0.228 

89 

20 

0.225 

91 

20 

0.222 


To validate the estimate for for the design, we use R, a free software 
version^ of the statistical package S-Plus, to generate 200 R-H designs, while recording 
the values. We successfully produce an RLH design with p^^ equal to 0.219. It 

may be possible to increase n to 93, but the current result 
is acceptable. 

Any experimenter can follow the process that we have described to construct a 
design that does not exist, or when there is a lack of computing power or software. An 
experimenter may also not require an orthogonal design and desire some correlation 
(Iman & Shortencarrier, 1985). Iman and Conover (1982) discuss ways to induce 
correlation, but RLH designs are an available alternative. Additionally, the new design 

dimensions that these tools provide are advantageous. The combination of p™” tables, 

(Pn™) equations, and RLH generation give researchers a simple, fast, and effective 

means to construct experimental designs with acceptable nonorthogonality. Of course, 
this procedure requires a good random number generator (Law & Kelton, 1999). 

C. EFFECTS OF FEWER CANDIDATE DESIGNS (G) 

There may be cases in which the experimenter does not wish to generate G = 200 
RLHs before selecting a suitable design. The manner in which the experimenter 

® R is freeware. The interested reader ean eontaet the author or eheek the SEED Center for Data 
Farming website at http://harvest.nps.edu for the R eode. 
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generates RLH designs may also be a eonstraint. To mitigate the impaet of sueh 
circumstances, we provide /?“” tables and |/?”” j expressions for different values of G. 

We introduce the symbol p™” as the smallest maximum absolute pairwise correlation 
value in G trials. The corresponding equation to estimate the expected smallest 
maximum absolute pairwise correlation value in G iterations is designated as | /?”” j . 

Investigating the impact of different values of G reveals two notable observations. 
We empirically show that for G > 200, the values of p™ vary only slightly. Conversely, 

as G decreases, the variance in /?“” increases, but not overly so. As a result, the 

expressions show similarity. However, the differences are important. To retain utility to 
experimenters, we set the lower bound for G at 10 and investigate 
Ge{l0, 25, 50, 75, 100, 125, 150,175,200}. 

With some slight modifications we use the same methodology to explore the 
relationship of transformed n and k values, as well as their interaction term. We lit 

multiple linear regressions for each G. We catalogue | /?”“ j tables and corresponding 
models. In our initial effort, we use 42 design dimensions to develop the 
model. We refine the (p™"] model fit using all 115 design combinations 

G200 V ' /G200 

and further use these data to develop the equations that follow. 


( min \ 

nmap I 



P, 


map 


G175 


Pm 


G150 



= 0.0873 + 7.859n^"''-0.109A: ‘''-11.702n^"'^A: 

= 0.0873 + 7.864n^"'' - 0.109A: -11.682n^"''A: 

= 0.0874 + - 0.108fc -11.650n^"'^fc 

= 0.0875 + -11.61 


( 2 . 1 ) 

( 2 . 2 ) 

(2.3) 

(2.4) 


G125 
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(2.5) 

( 2 . 6 ) 

(2.7) 

( 2 . 8 ) 


= 0.0883 + 7.996n^"'^-0.0902A: (2.9) 

CIO 

These equations have similarity in their coefficients, but the subtleties in each 
expression are important. The differences result in fairly precise estimates of expected 
p™” from a given G RLHs. This set of equations provide the experimenter with an 
option for the number of RLH designs to generate, along with design choices of n, k, and 

Pmap * 


( min 
map 


D. SUMMARY 

Scientists often use uncorrected LH designs and deal with effects of having high 
correlation among the columns of the design matrix. Our efforts simplify the process for 
any experimenter to obtain a design that meets a correlation threshold, which may not 
necessarily be orthogonal, or nearly orthogonal. We define the of an LH as a single 

measure of its nonorthogonality. We describe tools based on this measure, and present an 
algorithm that combines them with RLH generation to obtain designs with acceptable 

nonorthogonality. Tables for /?”” for specific values of G are reusable, as are the 

formulas for | /?”“ j . The availability of packages that easily generate LHs and the 

ubiquity of off-the-shelf analytical software offer scientists a powerful, but simple, set of 
tools to obtain a desired experimental design. 
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III. CORRECTED RLH USING ELORIAN’S METHOD EOR 

NEW DESIGNS 


This chapter explores the utility of applying Florian’s method to reduce 
correlation in LHs. We iteratively apply Florian’s method to produce new OLH and 
NOTH designs. Cioppa (2002) conjectures that it is necessary to generate a large number 
of candidate LH designs from which to choose a few that meet nonorthogonality 
thresholds before applying correlation reduction methods. In some cases, Cioppa 
suggests a million or more candidate designs. We explain the relationship of the size of 
the experiment and the number of candidate RLH designs to generate before applying 
Florian’s method. A study of this relationship reveals that the number of candidate 
designs is often much less than 1,000 for even large experiments. 

A. FLORIAN’S CORRELATION REDUCTION METHOD 

This section briefly describes Florian’s method to understand its mathematical 
foundations (Florian, 1992). Consider an initial RLH design, X, with dimension n and k. 
Because the number of unique value levels for any factor in an LH is equal to the number 
of runs (n) in the design matrix, there are no ties. We convert the design into a matrix of 
ranks, R, with the same dimension as the design matrix. The entries in each column in R 
are the ranks of the values for the corresponding variable. 

We use the rank correlation matrix of R to construct the transformation matrix. 
The rank eorrelation matrix, T, consists of the pairwise correlations of the columns of R. 
The value T.j is Spearman’s coefficient of correlation (Conover, 1999) for the columns 

R. and Rj for all pairs (ij) in R, with 

where R\ is the rank of the f' element in the 1 °^ column. Spearman’s computation of the 
correlation is a modification of Pearson’s sample coefficient of correlation, based on 
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ranks when there are no ties. Thus, it is appropriate for our use since each value in the 
design column is unique. 

Clearly T is symmetric and positive definite. Therefore, it is factorable through 
Cholesky’s decomposition (Leon, 2002). We seek a transformation matrix, S , such that 
STS is equal to the identity matrix, where S = 2 ' and T = QQ . 

Determining Q involves Cholesky’s method. Because T is symmetric, a lower 
triangular matrix, L, exists such that T = LDL , where D is a diagonal matrix of positive 
values. Cholesky’s decomposition states that L, =LD'^^and that L,L, = T . Therefore, 

2 = Li and 5 = 2 ' = )', so 5' = | . 

Florian uses S to update R. The new rank matrix is = RS and the factor 

level values of X are rearranged to meet the new rank structure, which usually results in 
reduced correlation among its columns (Florian, 1992). 

B. DIMENSIONALITY AND FLORIAN’S METHOD 

A Florian-improved RLFl (FRLFI) design is the result of a relatively simple 
method for creating uncatalogued nearly orthogonal designs. We produce one RLH, 
compute its and apply Florian’s method. We compute for the transformed 

matrix and compare the two nonorthogonality measures. If there is an improvement, we 
reapply Florian’s method. We iterate until p^^^ no longer improves and store the last 

improved design. Table 10 contains the best p^^ from 20 separate instances of this 

process for each of many different RLH design dimensions. We highlight a number of 
the design dimensions that previous size restrictions prevent. Notably, we find an 
improved n = 33, k = 16 design that has a p^^^ of 0.031, which meets Ang’s (2006) 

NOLH criteria. With Cioppa’s (2002) NOLH convention, exploring 16 factors requires 
at least 65 runs. Ang (2006) reduces the number of runs to 33. Our unique n = 33,k= 16 
design equals Ang’s achievement. Additionally, our method increases k and introduces a 
new n = 33, k = 22 design with p^^ equal to 0.034. Again using Cioppa’s (2002) and 
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Ye’s (1998) naming convention, we use the symbol to denote an FRLH design with n 
runs and k factors since it may not neeessarily result in an OLH or NOLH. 


Table 10. The best maximum absolute pairwise correlation values from 20 FRLHs 
of the same design dimension are presented. Gray-shaded design dimensions 
emphasize intervals that this teehnique ean fill in the OLFI and NOLH eatalogue, 
using Ang’s (2006) near orthogonal threshold. Diagonal-lined areas identify 
dimensions yet to meet NOLH criteria. 



K 

N 

7 

11 

16 

22 

29 

37 

46 

56 

17 

O.O.M 

O.OM 

0.105 

N.\ 

NA 

NA 

NA 

NA 

25 

0.039 

0.03^1 

0.053 

0.080 

NA 

NA 

NA 

NA 

33 

0.028 

0.030 

0.031 

10.0341 

0.0.54 

NA 

NA 

0.056 

NA 

49 

0.019 

0.018 

0.018 

0.020 

0.022 

0.036 

NA 

65 

0.013 

0.014 

0.013 

0.015 

0.015 

0.017 

0.019 

0.039 

97 

0.009 

0.009 

0.010 

0.010 

0.010 

0.010 

0.010 

0.012 

129 

0.007 

0.008 

0.006 

0.007 

0.007 

0.006 

0.007 

0.007 

193 

0.005 

0.005 

0.004 

0.004 

0.005 

0.005 

0.005 

0.005 

257 

0.003 

0.004 

0.003 

0.003 

0.003 

0.004 

0.003 

0.003 

513 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 

0.002 


For design dimensions that meet our guidance, Florian’s method is often 
suffieient to produee an NOLH design from a single RLH. Cioppa (2002) generates 
millions of designs before seleeting a few on whieh to apply Florian’s method. 
Tang (1998) empirieally studies the effeets of initial designs when redueing eorrelation. 
Tang finds that for small values of n and k, the initial design has a large effeet on the final 
design’s eorrelation value. To mitigate this effect, Tang generates three designs and 
ehooses the one with the least eorrelation as the initial design. As n increases, the effeets 
of the initial design diminish. We provide empirical evidence that, for smaller design 
dimensions, the initial design has greater impact. 

We produce many new NOLH designs with iterative applications of Florian’s 
method. While previous methods are constrained to one exact design for a given 
dimension, our technique can result in many NOLH designs for the same dimension. 
Additionally, this method results in many new large NOLH designs. For instanee, we 
ereate an Aj™ design that has = 0.00267 in 50 minutes using a standard 2 gigahertz 


desktop eomputer with 3 gigabytes of RAM. 
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We further examine the role that design dimensions play in the effeetiveness of 
Florian’s method. Using only one eandidate RLH for eaeh design dimension, we 
examine 53 different design dimensions in whieh the ratio of k and n is less than or equal 
to 0.333. Of the 53 designs, 52 met NOLH eriteria after iterative applieation of Florian’s 
method. In all but three of the designs, n was greater than or equal to 49. After empirieal 
evaluation, we develop a rule of thumb that any design dimension, where n> 50 and 

fl 

A: < —, likely results in a design that meets NOLH eriteria. 

A eomparison of these new designs with eatalogued NOLH designs provides 

some gauge for the veraeity of our elaim regarding Florian’s method. Table 11 gives 
Pmap before and after applieation of Florian’s method to a single RLH. We eompare an 

Fjf design with the aetual value of the eatalogued NOLH design. In ten trials, we 
see that all ten Fjf designs eaeh have a that is less than the A“ p^^. We 
aeknowledge that Cioppa also sought spaee-fdling oharaeteristies for his designs. We 
inelude the eondition numbers (CN), whieh also show improvement. Other trials for 
different design dimensions yield similar results. Although not all eomparisons favor the 
F/ designs as having better properties than designs, they are very similar in 
measured values and still meet nonorthogonality thresholds. 
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Table 11. 


Comparison of values and CNs for RLH, Fjf, and N^l. 



Before Florian 

p65 

w65 


P map 

CN 

Pmap 

CN 

Pmap 

CN 

Trial 1 

0.3012 

6.7312 

0.0176 

1.0929 

0.0219 

1.1030 

Trial 2 

0.4272 

5.5712 

0.0191 

1.0992 

0.0219 

1.1030 

Trial 3 

0.4176 

7.5441 

0.0149 

1.0852 

0.0219 

1.1030 

Trial 4 

0.3831 

5.9205 

0.0215 

1.1028 

0.0219 

1.1030 

Trial 5 

0.3216 

6.5351 

0.0156 

1.0818 

0.0219 

1.1030 

Trial 6 

0.3475 

6.2600 

0.0157 

1.0853 

0.0219 

1.1030 

Trial 7 

0.3096 

6.6278 

0.0172 

1.0893 

0.0219 

1.1030 

Trial 8 

0.4438 

7.5144 

0.0158 

1.0949 

0.0219 

1.1030 

Trial 9 

0.4882 

7.9720 

0.0179 

1.0911 

0.0219 

1.1030 

Trial 10 

0.3221 

6.7882 

0.0198 

1.0901 

0.0219 

1.1030 


These results lead to a loose constraint for applying Florian’s method. When 
n 

u>50and usually transform RLH designs to meet Cioppa’s (2002) 

NOTH criteria; < 0.03 and condition number less than or equal to 1.13, as well as 

Ang’s (2006) thresholds of p^^ < 0.05 and condition number less than or equal to 1.20. 

Ang’s criteria allow more latitude and we therefore adopt them as our own. We note that 
a departure from these guidelines still results in a reasonable reduction of the column 
correlation in the design. Later discussions in this study show new methods that further 
relax even these loose constraints. 

C. CONSTRUCTING NEARLY ORTHOGONAL LATIN HYPERCUBES 

A methodical process to generate NOLH designs with desired orthogonal 
characteristics may be easily implemented. The first iteration of Florian results in a new 
design matrix, Anew However, p^^ for this new design matrix may not be acceptable 

and it may be possible to improve Anew even further. Steps to iteratively update the latest 
design to create a design with the smallest p^^ follow. 

Step I: Generate an RLH, Xom- 

Step 2: Compute p^^ of and designate as . 
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Step 3: Apply Florian’s method to Xom to create Anew 
Step 4: Measure of Anew and designate as . 

Step 5: If is less than , then rename Anew to Xm and to 

and return to Step 3. 

Step 6: At termination, A^w is the best design. 

Florian’s method overcomes the dimension limits that other correlation reduction 
methods face. To illustrate the extent that Florian’s method fills the gaps in NOTH 
design dimensions, we explore multiple LH designs that receive this treatment. Table 12 
shows p^^ values for a collection of new designs after treatment with Florian’s method. 

As in previous discussions, we do not develop designs where n<k . As expected (Owen, 
1994), designs in which k^n demonstrate an increase in p^^^. This table includes a 

number of design dimensions that do not exist in the current catalogue of OLH and 
NOTH designs. To follow the guidelines for NOTH design dimensions, a design with 22 
factors (m = 7) requires 129 runs. We show that a design for 49 runs and 22 factors with 
Pmap 0.020 can be obtained by Florian’s method. In fact, we catalogue NOTH designs 

for 97 and more runs, as well as for values of n between these values. 


Table 12. These are the best maximum absolute pairwise correlation values from 20 
FRLHs of the same design dimension. These designs are a continuation of 
Table 10. All of these design dimensions, as well as the design combinations 
between their intervals, fill many of the gaps in the OLH and NOTH catalogue. 



K 

N 

1 

11 

16 

22 

29 

37 

46 

56 

67 

79 

92 

106 

121 

137 

154 

172 

17 

0.054 

0.061 

0.105 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

25 

0.039 

0.039 

0.053 

0.080 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

33 

0.0281 

0.030 

0.031 

10.0341 

0.0.^4 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

49 

0.019 

0.018 

0.018 

0.020 

0.022 

0.036 

ii.i)5n 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

65 

0.013 

0.014 

0.013 

0.015 

0.015 

0.017 

0.019 

0.039 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

NA 

97 

0.009 

0.009 

0.010 

0.010 

0.010 

0.010 

0.010 

0.012 

0.012 

0.0530.052 

NA 

NA 

NA 

NA 

NA 

129 

0.007 

0.008 

0.006 

0.007 

0.007 

0.006 

0.007 

0.007 

0.008 

0.010 

0.018 

0.030 

0.058 

NA 

NA 

NA 

193 

0.005 

0.005 

0.004 

0.004 

0.005 

0.005 

0.005 

0.005 

0.006 

0.005 

0.005 

0.006 

0.010 

0.012 

0.042 

0.037 

257 

0.003 

0.004 

0.003 

0.003 

0.003 

0.004 

0.003 

0.003 

0.004 

0.004 

0.004 

0.004 

0.006 

0.005 

0.005 

0.012 
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n 

We conclude that, for an experiment requiring n > 50 and k< — ,WQ can produce 

many new NOLH designs with Florian’s method. As Table 12 shows, we can sometimes 
violate this rule of thumb and still produce an NOLH, such as the n = 49, 
k = 22 design (shaded in Table 12) discussed previously. We also see that as k 
approaches n, Florian’s method finds it difficult to produce a design with <0.05, 

such as the n = 25, k = 22 design that has = 0.08. We explore a new approach to 
solve this new aspect of the experimental design problem. 

D. SUMMARY 

The body of work to reduce or eliminate correlations in LH designs is extensive. 
OLHs (Ye, 1998) and NOLHs (Cioppa, 2002) have earned reputations as highly useful 
designs. Historically, construction of these designs is computer intensive and time 
consuming. Our findings show that Florian’s method and RLH generation can save the 
analyst a significant amount of time and computational cost. Moreover, by starting with 
new RLHs, we can generate many such designs and use other criteria (e.g., space-filling 
properties or higher-order correlations) to discriminate between multiple NOLHs. Given 

fl 

our simple rule of thumb, where n > 50 and ^ NOLH designs are 

possible with this technique. In many cases where this rule of thumb does not hold, 
Florian’s method is still effective enough to produce a design that meets NOLH 
nonorthogonality criteria. 
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IV. A MIXED INTEGER PROGRAMMING APPROACH TO 

MINIMIZE 


This chapter explores optimization as a means to construct orthogonal Latin 
hypercube (OLH) and nearly OLH (NOLH) designs, thereby further expanding the 
situations in whieh they ean be used. We are aware of the difficulties that an 
optimization approaeh presents for solving the experimental design problem. As n and k 
inerease, the dimensionality of LHs makes optimization an unattraetive option. As 
Chapter I explains, a vast number of unique LHs are possible for even moderate values of 
n and k. If an exhaustive seareh were applied, even moderately large values of n and k 
would prove to be impossible to solve to optimality. For instance, to develop an Aff 
design with an optimization model that eonsiders all possible permutations may involve 
an algorithm that ean explore up to (33!)" LHs to guarantee an optimal solution. 

Techniques such as the simplex method (Nash & Sofer, 1996) are more efficient in 
finding an optimal answer when all variables are continuous. However, nonlinear and 
integer methods run into a “combinatorial explosion” (Wolsey, 1998) that make solving 
the complete design of experiment problem computationally intractable as n and k grow 
large. Therefore, we develop a construetion methodology based on a foeused 
optimization routine, which greatly expands the range of values for n and k for whieh we 
can create orthogonal and nearly orthogonal designs. 

We combine RLH generation, Florian’s correlation reduetion method, and 
optimization of a mixed integer programming problem to develop a new algorithm that 
relaxes existing size restrictions and fills many gaps that exists in the OLH and NOLH 
library. We continue to use as the key measure for discriminating between designs. 

To best illustrate the power of this new methodology, we eoneentrate on design 
dimensions that are absent and most difficult to generate with previous methods; that is, 
experiments with n<50 and k approaches n. 
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A. 


SIMPLIFYING THE COMPUTATION FOR NONORTHOGONALITY 


In explaining our optimization approach to address the DOE problem, we find it is 
more useful to discuss LHs in terms of the ranks that correspond to each factor’s value 
levels. Because LH designs assume all variables are continuous, the experimenter simply 
sets the number of unique values to equal the number of runs. Unique factor values 
ensure that there are no ties in the ranks of any column in the design. Consequently, for a 
design with n runs, each column in the LH design is merely a permutation of integers 
1 to n. 


Considering values as rank s from 1 to n simplifies computations. For instance, 

_ _ n +1 

the average value in any column, I, is always x, = x = . The corresponding variance 


^ n ^ 2 ^ 

for any column is erf = cr^ = — ^(x;-x^ =-, i.e., a constant, for any n. The 

n 12 

Y n 

covariance between columns I and m is cr,„, =- ■ Therefore, the 


correlation between columns I and m is 


Plm 


<J, 


Im 


/ 

i=i' 


n + \ 


X: - 


n + \ 


X, - 


4 


(7, <J 

I m 


i=\ 


E - 


n + 1 


(4.1) 


For any dimension n and k, we wish to find a design matrix, with the 

smallest maximum absolute pairwise correlation . We use an optimization model 
and algorithm to help minimize . 


B. A LINEAR FORMULATION OF THE MATHEMATICAL MODEL 


From Equation 4.1, the denominator is constant for all I and m. The most 
important features of the equation are in the numerator. Therefore, minimizing p^^ is 


equivalent to minimizing 


map 


= max 

l^m 


X(x'-x)(xr-x) 


(4.2) 
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Our task to minimize is constrained by the requirement that each column of X is a 

permutation of the first n natural numbers. Difficulties in formulation arise in the 
nonlinearity and nondifferentiability of the absolute-value function. To construct an 
OLH or NOTH with dimensions n and k using optimization techniques, we develop a 
linear mathematical model for Equation 4.2. 

1. An Initial Mathematical Model 

We follow the Naval Postgraduate School Operations Research format (Dell, 
2004) for presenting an optimization problem. We annotate the objective and constraint 
equations in these formulations for reference. Our initial formulation of the 
problem follows. 

Model 1 {n, k): 

INDICES 


i 

runs (alias 7)^ i 

= l,...,n 


1 

factors (alias m) 1 

= l,...,k 


VARIABLES 




level of factor on the 

run 


y 

variable for 



FORMULATION 



min 

V 


(AO) 

s.t. 

V -x)(x--x) 

i=\ 

\fl^m 

(Al) 


(=1 

\fl^m 

(A2) 


X\ ^ X‘ 

yi^jj 

(A3) 


1< X‘ < n, integer 

V/,/ 

(A4) 


The objective function AO is simply V; our constraints force V to assume the value 
at our approved solution. Because the absolute value is a nonlinear expression, 

^ The alias of an index, i, allows the programmer to use a different index, J, to refer to the same set 
of numbers. 
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constraints A1 and A2 separate it into terms that we can more readily transform into 
linear terms. Constraints A3 and A4 require that eaeh eolumn of A be a permutation of 
the integers from 1 to n. Constraints A1 and A2 are nonlinear, and constraints A3 and A4 
are noneonvex. 

2. Transformation of a Nonlinear Optimization Problem 

We reformulate our mathematical model with the goal of converting it into a 
linear optimization problem. The advantage of linear funetions is that they are proven to 
be convex and thus have one or more extreme points that eorrespond to maximum or 
minimum objeetive values (Bazaraa et ah, 2005). 

Optimization frequently centers on solving convex problems, of which linear 
types are most easily determined. The general problem of convex optimization is to find 
the minimum of a convex or quasiconvex function on a finite-dimensional convex space, 
speeified by a set of extreme points and extreme rays or vectors (Bazaraa et ah, 2005). 
As so defined, a maximum or minimum value of a convex function may be found by 
systematieally examining extreme points from the convex space (Bertsimas & Tsitsiklis, 
1997: Nash & Sofer, 1998). 

The eonstraints of the mathematieal model define the feasible region of the 
objeetive funetion. A set of convex or quasiconvex eonstraints result in a convex or 
quasieonvex feasible region, whieh eontains extreme points and an optimal solution. 
Reeiproeally, a feasible region resulting from one or more noneonvex constraints results 
in a noneonvex feasible region, whieh cannot guarantee an optimal solution. 
Unfortunately, we find the design of experiment problem to be nonlinear and noneonvex. 

Our first step to linearize the problem is to reformulate A3 by introducing binary 
variables. We represent one run (design point) i of one factor I with a set of n binary 
variables, one for each possible level j of the factor on that run. For a binary variable, 
Y. ■, if the factor I is set to the level in the run, we let the variable equal one, and 

zero otherwise. Therefore, = 1 means that for the 4* factor, the level is set at 7 in the 

3'^‘* run of the experiment. The addition of these variables requires new constraints to 
control the values that they may assume. Our revised model follows. 
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Model 2 {n, k): 

VARIABLES 

Y‘j equals 1 (0 otherwise) if faetor I is set to level j in run i 


V variable for 

FORMULATION 

min V 



(BO) 

(Bl) 

(B2) 

(B3) 

(B4) 

(B5) 


Constraints Bl and B2 are reformulations of A1 and A2. Constraints for B3 
require that exactly one level be chosen for each factor, for each run. Constraints B4 
ensure that each level is chosen exactly one time for each factor. These constraints are 
linear since they are nothing more than a summation of simple variables. However, 
constraints Bl and B2 are still nonlinear because they involve the product of Y‘j and Y/". 

3. A Complete Linear Formulation of the Optimization Problem 

There are several ways to deal with the nonlinearity in Bl and B2. One is to 
develop derivative-free heuristic algorithms (including the many varieties of local search) 
that generate many design matrices, and evaluate V for each one. Evaluating V is much 
easier than optimizing it, but we recognize that heuristics do not guarantee an optimal 
solution, although they sometimes result in one. 

Our approach to the difficulties associated with constraints Bl and B2 is to 
optimize one column (factor) of the design matrix at a time. We find level settings for a 
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given factor, m, that minimizes between m and every other (fixed) column in the 

design. Therefore, variables corresponding to factor m are the only ones that appear in 
the model, and we treat all other factors in the design as having “fixed” values, denoted 
X\. The resulting levels for the targeted factor m guarantees the lowest value for , 


given that all other factor levels remain constant. This final formulation is a single¬ 
column optimization program for a specified column, m, in a design with n runs and a 
total of k factors. 

Model 3 {m,k ,n): 

DATA 


X. level of factor I for run i in design X 

FORMULATION 

min V 


s.t. v> Z(x;-x) xjx, 

1=1 

v^-t(x!-x) YjY, 


-X 


m — 

J-^ 




n 

If =1 

7=1 

n 

Zf =1 


(=1 


K, ^ {0.1} 


\/l ^ m 

yi, i 

yj, l 


(CO) 

(Cl) 

(C2) 

(C3) 

(C4) 

(C5) 


Constraints Cl and C2 are now linear in Y"". Several of the terms in these 
constraints are constant, and can be further simplified; 

Zlv' -4 tiY’ -^l=z(v' -(-)Z(V' -fi 


(=1 


i=\ 


y=i 


/=i 


=Z(v'-dZ7i;; 

i=i j=\ 


(4.3) 
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This simplification yields our final model. 

Final model 3 {m, n, k): 

INDICES 

i runs (alias j) i=\,...,n 

I faetors l=\,...,k 

DATA 

X\ level value of faetor I for run i in the given design X 

VARIABLES 

Y” equals 1 (0 otherwise) if factor m is set to level j in run i 

V maximum absolute value of for any two columns in X 


FORMULATION 



min V 


(CO) 

1 

Al 

-J-* 

\fl^m 

(Cl) 

1=1 j=l 

\fl^m 

(C2) 

j=i 

yi 

(C3) 

II 

Vj 

(C4) 

r-ejo.i} 

V/, j 

(C5) 


For the remainder of this document, we refer to the optimization program based 
on the final model as VMIN; a minimization model for the variable V. 

C. OPERATIONALIZING THE LINEAR OPTIMIZATION PROBLEM 

This final formulization is key to our search for new designs with orthogonal or 
nearly orthogonal properties. We encode VMIN into GAMS (General Algebraic 
Modeling System, 2008), a powerful high-level modeling system for mathematical 
programming and optimization, and solve using the commercially available solver Cplex 
(2008). GAMS eonsists of a language eompiler and a suite of integrated high- 

47 



performance solvers that are specifically designed for modeling large linear, nonlinear, 
and mixed integer optimization problems. The result is a robust programming language 
that can solve problems with thousands of linear constraints and thousands of variables. 

1. The Size and Complexity of VMIN 

Because the complexity of even relatively small problems has the potential of 
taxing the model to a point that prevents it from converging to a solution, we deliberately 
choose to focus on one factor column to solve. Equations from C3 in VMIN correspond 
to a total of n equations—one for each run in the design. Because m is the targeted 
column, it contains the only set of variables in the problem—one for each level of the 
factor m. We illustrate with a linear equation for C3, where the targeted (shaded) column 
is m = 3 and run number is i - 2. If we set n = 5 runs, there will be five value levels, 
which the variables Y^., j = 1, 2, 3, 4, 5 represent in the shaded column of 
Table 13. Column (k#) and row (n#) labels in the table are for ease of reference. 

Table 13. We present how the parameters of a design and the design’s initial values 
translate into the constraints in VMIN to optimize the values of the targeted 
variable column (m = 3, k = 4, n = 5), where value levels for column 3 remain as 
variables and all other values are fixed. 



kl 

k2 

k3 

k4 

nl 

3 

4 


4 

n2 

4 

2 

^2J 

5 

n3 

1 

3 

Y^ 

2 

n4 

2 

5 

Y^ 

1 

n5 

5 

1 

Y^ 

3 


The constraint for this instance of C3 is + Y^^ + Y^^ + Y^^ = 1. We see in 

this equation that only one of the variables may equal one—that is, only one value level 
from factor 3 can be assigned to the second run of the experiment. 
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The following discussion explains the constraints involved in the linear 
formulization of the problem. For constraints Cl and C2, we consider 

X = ZLtl = = 3 and the fixed values for column s kl, k2, and k4 of the initial design 

2 2 

to construct the following pseudo expression for . 


y > 


1=1 j=i 


\fl^m 


V > (3-3)(iy,^, +27,^, +3y,>4};>5y,^5) + (4-3)(iy,^, +27,^, +37,^3 +47,>57,^ 3 ) + 
(l-3)(>, +27,^3 +37,>47,>57,^3) + (2-3)(17,>27,> 37,^3 +47,>5}^^3) + 
(5-3)0 +2> +31)3 +4> + 57 ,^ 3 ) 

y > (4-3)0 +2> +37047,>57,^3) + (2-3)0, +27,^3 +37,^3 +47,>57,^3) + 
(3-3)0, +2> +37,0<4 +05) + (5-3)(0 +27,037 i 3 +04+51)^5) + 
(l-3)(0 +27,^3 +37,047,057,^3) 

y > (4 - 3)(17,^, + 27,^3 + 37,^3 + 47,^, + 57,^3) + (5 - 3)(17,', + 27,^3 + 37,^3 + 47,+ 57,^3) + 

(2-3)0 +27,^3 +31)04> +57,3) + (1-3)0 +27,037,^3 +41)051)^3) + 

(3-3)0, +27,037,041)051)^5) 


y > 


y > 


y > 


y > 


1=1 3=1 

^(3-3)0 +27,037,041)057,^3) + (4-3)0, +27,^3 +37,^3 +47,0 57,^3)+' 

- (1-3)0, +20 +31)041)050) + (2-3)O +20 +30 +404+50) + 
J5-3)0, +27,037,0404+50) 

^(4-3)(lO +27,030 + 4 O 4 +503) + (2-3)(10, + 2 O 3 +37,047,050)+' 

- (3-3)(17,027,030 + 4 O 4 +50) + (5-3)(lOi + 2 O 2 +30 +404+505) + 
J1 - 3)(lO, +27,030 + 4 O 4 + 50 ) 

^(4 - 3)(17,^, + 27,^3 + 37,^3 + 40, + 57,^3) + (5 - 3)(17,^, + 2 O 3 + 37,^3 + 40, + 57,^3) +" 

- (2 - 3)(17,^, + 2Yl, + 37,^3 + 40, + 57,^3) + (1 - 3)(lO, + + 31)^3 + 40, + 5 O 3 ) + 

J3 - 3)(lO, +27,030 + 4 O 4 + 50 ) 


Constraints for C3 ensure that each run has exactly one value level assigned. 
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V/ 


i=i 

y3 I y3 , y3 , y3 y3 i 

y3 I y3 , y3 , y3 , y3 i 

y3 I y3 , y3 , y3 , y3 i 

y3 I y3 , y3 , y3 , y3 i 

y3 I y3 , y3 , y3 , y3 i 

Constraints for C4 ensure that eaeh value level is assigned exaetly one time. 

Zr"=i vj 

i=i 

y3 I y3 , y3 , y3 , y3 i 

y3 I y3 _,_y3 , y3 , y3 i 

-*1,2 ^-*2,2 ^-*3,2 ^-*4,2 ^-*5,2 ^ 

y3 I y3 , y3 ,y3 , y3 i 

y3 I y3 , y3 , y3 , y3 i 

-*1,4 ^-*2,4 ^-*3,4 ^-*4,4 ^-*5,4 ^ 

y3 I y3 _|_y3 , y3 , y3 i 

-' 1,5 ^-'2,5 ^-*3,5 ^-*4,5 ^-' 5,5 ^ 

Finally, constraints for C5 ensure that each binary variable assumes only the value of 1 or 
0. It requires a set of two constraints for each variable. 

i;"sjo,i} vi.j 
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VMIN constraints become more eomplex as n and k inerease. There are 25 binary 
variables, 1 eontinuous (objeetive) variable, and 66 eonstraints for this relatively small 
design. The size of the problem inereases quadratieally in the dimensions. Consider a 
problem n = 33 and k = W. The problem now involves nearly 1,100 equations and 
almost 1,000 binary variables to determine an optimal arrangement of value levels for 
just one faetor, m. The solution is a subset of variables of size n, eaeh equaling one, and 
the remaining n{n - 1) variables being zero. 

2. Applying RLH and Florian’s Method to Initiate VMIN 

Random Latin hypereube (RLH) generation and Florian’s eorrelation reduetion 
method provide an exeellent initial solution to VMIN. Preliminary eomputations to 
support an optimization proeess are eategorized as heuristie algorithms—methods to 
reach suboptimal solutions (Bertsimas & Tsitsiklis, 1997). 

A good initial solution aids the optimization routine to solve the problem more 
quiekly and with a greater ehance of reaching a low . In our process, we obtain a 
good initial solution by quiekly generating many RLHs—through k random permutations 
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of the first n natural numbers—and seleeting the one with minimum . Then, we 

iteratively apply Florian’s (1992) proven eorrelation reduetion method to ereate an FRLH 
that serves as the starting solution for VMIN. In practiee, for designs of the seale in this 
paper (or a little larger), it takes only seeonds on a modem desktop proeessor to generate 
hundreds, or even thousands, of RLHs for designs with seores of variables. The FRLH 
improvement on the best RLH usually takes a few minutes. It typieally takes on the order 
of a few hours to generate the final design—whieh likely requires several iterations of 
VMIN. Additional eolumns ean be added by appending a new random eolurnn and then 
iteratively applying VMIN. For the size of the experiments we diseuss, it usually takes 
on the order of tens of minutes to optimize the new eolurnn. 

Table 14 shows a design with n = 17 mns and k = 1 faetors. The assoeiated 
objeetive value, V, for the initial design eorresponds to p^^^ = 0.066 . Table eontents are 

the aetual levels for the eolurnn faetor for eaeh mn. There are 17 = 289 variables for 
eaeh speeific faetor, m, in this problem: . Table 15 

illustrates the initial binary value for =1—meaning that the aetual level value for 

faetor 6 on the 2"‘* mn is 4, and reeiproeally Y^j = 0 for all 7^4. 
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Table 14. This n = \l,k = l design, = 0.066 , is a result from generating 

numerous RLHs, seleeting the design with the least nonorthogonality, and 
applying Florian’s method iteratively. It is the initial design before using VMIN. 



kl 

k2 

k3 

k4 

k5 

k6 

k7 

nl 

15 

2 

7 

5 

8 

9 

15 

n2 

17 

15 

11 

11 

6 

4 

14 

n3 

5 

7 

8 

14 

2 

16 

4 

n4 

13 

10 

1 

9 

15 

13 

5 

n5 

11 

6 

17 

3 

12 

15 

1 

n6 

3 

14 

14 

13 

10 

6 

10 

n7 

7 

16 

2 

6 

11 

14 

12 

n8 

8 

9 

13 

1 

16 

7 

8 

n9 

1 

11 

5 

2 

7 

8 

6 

nlO 

14 

13 

3 

15 

5 

11 

3 

nil 

10 

5 

12 

10 

4 

1 

2 

nl2 

2 

1 

4 

8 

3 

2 

11 

nl3 

4 

3 

6 

16 

17 

12 

16 

nl4 

6 

12 

16 

17 

13 

10 

9 

nl5 

9 

17 

10 

4 

9 

3 

13 

nl6 

16 

4 

9 

12 

14 

5 

7 

nl7 

12 

8 

15 

7 

1 

17 

17 


For a more expansive look into this example, we eontinue to foeus on eolurnn 
m = 6 of the initial design. The initial F)” solution set for m = 6 is more easily seen in 

Table 15 as a two-dimensional display of variable values that show exaetly one faetor 
level (/) assigned to eaeh run (n). This matrix of zeroes and ones corresponds to the 
values in column vector (k6) from Table 14. 
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Table 15. Binary variable values for faetor m = 6 in the initial design show the 
sparseness of this single eolumn optimization problem. A set of these variables 

corresponds to each factor. 
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We apply VMIN to target factor m = 6 to improve V. In the following paragraphs 
we discuss the root mean square (Owen, 1994) to explain the rationale for selecting m = 6 
as the column to optimize. VMIN focuses on the problem of populating the above matrix 
with ones in the appropriate cells, and zero elsewhere. VMIN considers each of the 289 
variables in the problem, and through the Cplex solver, determines a solution (i.e., an 
arrangement of values of “l”s in the matrix) that minimizes V to within a user-specified 
tolerance of the optimal solution. 

Tolerance is the difference between the notional optimal value of the objective 
function and the actual computed value, divided by the notional value, and multiplied by 
100%. Common practice sets tolerance at 10%, but we use 1% in VMIN to approach 
orthogonality. We adjust tolerance as design dimensions near saturation. 
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3. Iterative Application of VMIN 

Iterative application of VMIN constructs the entire design. Solving one instance 
of VMIN guarantees an optimal solution for the current factor of interest. However, our 
simplification of the optimization problem to one column leaves us with a nonoptimal 
solution for the overall DOE problem. To reach orthogonality or near orthogonality 
requires a deliberate process. Therefore, after determining the best value levels for one 
column, we select a new column to optimize. We repeat this process until we have an 
NOTH, or there is no further progress. 

Selecting the next column to optimize can become complex. There are other 
methods for selecting the next column to optimize, such as selecting a column that is 
associated with , but this value exists for at least one other column. It is possible 

that even more than two columns are associated with . Instead, following Owen 

(1994), we use root mean square (rms) as the measure for selecting the next column to 
optimize. Computing rms for a column m involves summing the squared correlation 
coefficients between m and all other columns in the design, and dividing the total with the 
number of unique column pairs, which yields 

tpl 

pLW=^. 

where p,^ is the correlation coefficient in (1.1), and k is the number of columns in the 
design. We select the column with the largest rms as the one to optimize. The greatest 
rms identifies the column that has, on average, the largest squared correlation with any 
other column, and is therefore the most problematic, relative to all other columns. 
Selecting this column for optimization increases the chances for reducing the design’s 
overall nonorthogonality. In practice, this approach has proven straightforward, tractable, 
and successful. Although iterative column optimization may not terminate at a solution 
that equates to the lowest possible p^p, it provides a near optimal solution that has a 

very good chance of meeting NOTH criteria. 

Numerous difficulties emerge when using optimization to solve the complete 

DOE problem. It is a primary reason that scientists have sought other methods for 
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developing experimentation sehemes. Our presentation of a solution for m = 6 in 
Table 15 offers a glimpse of the issues that an optimization approach faces. The two- 
dimensional matrix presents the VMfN attempt to position “l”s where no two entries are 
in the same row or column. Taking this same problem vertically for each factor further 
complicates the arrangement of “l”s. A new restriction that no two-factor (m) matrices 
should have the same run (n) and level (/) cell filled complicates the problem sufficiently 
to challenge the most robust optimization schemes. Such a problem requires m sets of n 
variables moving simultaneously. The resulting problem is nonlinear and nonconvex. 

Restricting the optimization problem to one factor at a time allows us to transform 
it into a linear one. Our formulation offers a means for reaching a focused solution that 
guarantees an optimal solution for any given column, m. The method that we employ to 
generate an initial solution is grounded in optimization theory and practical experience. 
Our iterative application of VMfN methodically reduces to an acceptable level. 

Short of optimality, creation of new designs that meet NOTH criteria satisfies our study 
goals. This method breaks free from previous constraints to produce NOLHs in design 
dimensions that other approaches do not. 

D. NEW DESIGNS FROM VMIN 

This section describes the full impact of combining FRLH and optimization into a 
new methodology. Their synthesis produces new NOTH designs. We quantify our 
method’s effectiveness and present excerpts of new NOTH designs that are not possible 
to construct using earlier techniques. The relative ease and flexibility of this technique 
make it an attractive option for constructing new and efficient computer experiments. 

As we previously stated, small NOTH designs (n < 50) are difficult to generate if 

fl 

A:> —. Cioppa’s (2002) method generates millions of LHs before selecting one that 
warrants application of Florian’s method. In practice, Florian’s method successfully 
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n 

produces NOLH designs for n >50 and with a single RLH generation,* but eannot 

guarantee an NOLH for dimensions that violate these eombinatorial eonditions. 

Our optimization program is a eomplement to FRLH. It not only provides a 

n 

means to ereate NOLH designs with dimensions n>50, —<k<n, it covers design 

dimensions for any n, and k ^ n . Designs in Appendix A demonstrate our technique’s 
utility. 

Cioppa and Lucas (2007) discuss a number of OLH and NOLH designs and 
compare their orthogonal properties with the average correlation measure of simple LHs. 
An exeerpt from their artiele shown in Table 16 shows their dimensions and 
eorresponding orthogonal charaeteristies. 

Table 16. An excerpt from Cioppa and Lucas (2007) compares RLHs and Cioppa’s 
(2002) OLH and NOLH designs with respect to their orthogonal charaeteristies. 


Design 

Max Pairwise 
Correlation 

'^11 

0 

Best Nil 

0.0234 

Mean RH 

0.4015 

^16 

0 

Best Nil 

0.0219 

Mean RH 

0.3194 

Ol 29 

0 

Best A'f 

0.0015 

Mean 

0.2332 


Gaps in the current OLH and NOLH library exist because of the rigidity of 
dimensional eonstraints. To aehieve orthogonality, Cioppa’s existing convention for 
design dimensions requires an increase in the number of runs from 33 to 65 when the 

* In practice, one RLH generation is nearly always sufficient to produce a NOLH using Florian’s 
method. However, choosing the RLH with the least maximum absolute pairwise correlation, from up to 
1,000 RLHs, helps increase the chances of producing an NOLH. 
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number of factors increases from 11 to 12. In this example, there is a breach in the OLH 
and NOLH library for designs that have less than 17 runs and l<k<l, or for designs 
18 < n < 32 and 8 < A: < 11. 

Efforts to produce smaller designs benefit from starting with a “good” initial 
design (Tang, 1998). We consider a starting design for an 8 x 3 RLH, which violates our 

g 

rule of thumb; n = 8 < 50 and k=3> — = 2.67 . There are no known catalogued OLH or 

NOLH designs with these dimensions. We generate an LRLH design, which has a 

equal to 0.0476, shown on the left side of Table 17 Although the LRLH meets Ang’s 
near-orthogonality criteria, we employ VMIN. We use the LRLH design as a heuristic 
solution to initiate VMIN. VMIN optimizes the third column resulting in a fully 
orthogonal design, O*: = 0 and a condition number of 1.0. We add the final design 

on the right-hand side of Table 17 to the OLH library. 

Table 17. Results from a small design after iterative application of Llorian’s method 
on the left-hand is not an OLH design. It can be used as the initial design for 
VMIN. The final design is orthogonal, as shown on the right-hand design. 


We summarize our methodology to construct new designs in the following steps 
and present it graphically in Ligure 9. 

Step 1: Consider the dimensions, n and k, for the desired design. 

Step 2: Determine if OLH or NOLH designs exist for desired dimension; 
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If OLHs or NOLHs with the same design dimensions are available, use them. As an 
alternative, determine if it is possible to eonstruet a new design based on 
previous methods. 

If the designs do not exist in the library, or design dimensions do not meet OLH and 
NOLH eonventions, go to Step 3. 

Step 3: Determine if it is neeessary to apply the eombined teehnique or only 
Florian’s method. 

n 

If n > 50 and ^ ^ ’ generate one or more RLHs and iteratively apply Florian’s method. 

Upon termination, the resulting design will likely possess NOLH properties—if not, try a 
few more. If unsueeessful after a few iterations, go to Step 6. 

n 

If n > 50 and —<k<n,ox n < 50, go to Step 4. 

Step 4: Generate up to 1,000 RLHs and seleet the design with the minimum and 
designate it . 

Step 5: Iteratively apply Florian’s method on until does not improve. 

Designate this improved design as ; the result of a heuristic approach. 

Step 6: With as a start point, implement VMIN iteratively. 

Step 7: Record the resulting design. If the design is unsatisfactory, return to Step 4 and 
repeat with new RLHs. 

A departure from previous techniques, we find that our methodology usually 
uncovers more than one design from which to choose. Time permitting, an experimenter 
can generate a number of candidate NOLHs, and then use criteria other than p ^^, such 

as space-filling properties, to select the NOLH that best meets the experiment’s 
requirements. Some methodologies are deterministic in their final designs, such as 
Steinberg and Lin (2006) and Ang (2006), leaving the experimenter with only a single 
choice at their limits. The opportunity to quickly produce different NOLH designs, with 
few dimensional constraints, is an important strength of our method. Figure 9 outlines 
the steps to achieve these designs. 
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step 1 : 


Step 2: 


Step 3: 


Step 4: 


Step 5: 



Figure 9. This flow chart presents the combined process for matrix transformation and 
optimization of an MIP to construct an OLH or NOTH design. 

We demonstrate our methodology with a design for 14 runs and 7 factors. The 
resulting designs are in Table 18. 


Step 1: The desired design has dimensions n = 14 and k = l. 

Step 2: These dimensions do not conform to OLH or NOTH convention, which 
prompts the experimenter to go to Step 3. 

k 1 

Step 3: Because n = 14<50 and k = l ^ — = — = 0.50 >0.33, Florian’s method 

n 14 

alone will not likely result in an NOLH design. Go to Step 4. 

Step 4: Using scripts that we developed in R software, we generate 1,000 RLHs and 
select the design with the minimum and designate it The resulting 

design has equal to 0.367 and condition number 4.489. 

Step 5: Iteratively applying Florian’s method on results in a design with a 

Pmap 0.099. We designate this heuristic as and present it on the left-hand side 
of Table 18. 
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Step 6: Iterative implementation of VMIN on results in anNOLH design with 
Pmap 0.033 and a eondition number of 1.134. We show the final design on the 
right-hand side of Table 18. 

Step 7: We reeord this NOTH, , as an entry into the NOTH library. 

Table 18. The result from iterative applieation of Florian’s method on a randomly 
generated n= \A,k=l design is the initial design for VMIN, as shown on the 
left-hand side of the table. It does not meet NOTH orthogonality eriteria. The 
final design for n = 14, A: = 7 after VMIN is a new NOTH design, presented on the 

right-hand side. 
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Final Nearly Orthogonal Design: 

Pmap =0.033, condition(X'X) =1.13 

K1 

K2 

K3 

K4 

K5 

K6 

K7 

8 

4 

5 

3 

9 

13 

13 

7 

5 

1 

9 

14 

8 

6 

14 

3 

9 

7 

3 

7 

2 

4 

7 

3 

12 

1 

5 

8 

3 

1 

11 

14 

11 

2 

10 

10 

2 

14 

4 

8 

11 

7 

13 

8 

4 

6 

2 

3 

9 

2 

6 

8 

10 

5 

14 

5 

5 

9 

2 

1 

10 

4 

3 

11 

12 

6 

11 

13 

12 

4 

12 

13 

7 

13 

6 

10 

14 

6 

14 

13 

8 

7 

6 

1 

1 

11 

10 

2 

4 

9 

11 

9 

10 

12 

5 

12 

1 

12 


We reiterate that this proeess is readily repeatable—using new random number 
streams to yield a new starting RLH—and develop many different NOLHs with our 
required dimensions, perhaps using other eriteria, sueh as spaee-filling properties or 
eonsideration for higher order terms (interactions, quadratic, etc.) to select the 
best design. 

An extension of this example demonstrates the power and flexibility of our 
combined methodologies. To develop Vjj we use a new random stream to build a new 
n = 14, A: = 10 RLH design, and iteratively apply Florian’s method to produce a heuristic 
solution to initiate VMIN. An V'q design was possible with this method. Adding one 
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new column at a time from a random permutation of the value levels and applying 
VMIN, we construct the nearly saturated NOLH design, NI 2 , shown in Table 19. 

Table 19. The NI 2 developed by RLH, FRLH, and VMIN emphasizes the flexibility 

of our methodology to construct any number of different NOLH designs (to 
include saturated ones). Note that none of the columns of this design are identical 
to the design on the right-hand side of Table 19. It shows that this new design is 
not merely an extension of the n= 14, k = 7 design, although our technique can 

certainly extend smaller designs. 
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Certainly another approach would be to use the initial n = 14, k = 7 design from 
Table 18 and add one column at a time, but starting at this low number of variables is 
cumbersome. We chose to be aggressive and began with an k:n ratio equal to two-thirds, 
violating our “one-third” rule of thumb. The flexibility of saturated designs permits 
experimenters to take any subset of columns to develop a new design, when for example, 
more degrees of freedom for error in a linear fit is desirable. We complete our 
development of this design with a final application of Florian’s method to determine if an 
improvement is possible. There is no improvement in and we enter Njj into the 
NOLH catalogue. 
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Our method requires relatively little time and resources. We construct the design 
using R software and GAMS. Generating 1,000 RLHs in R requires only seconds for a 
14 X 10 design matrix. Applying Florian’s method iteratively takes a similar amount of 
time. In practice, generation of 1,000 small RLHs, selection, and administration of 
Florian’s method are immediately sequential and involves less than two minutes. The 
GAMS program requires the most time, but current applications for small designs show 
that it is not uncommon for VMIN to reach completion within five minutes. 

We use the largest of the “small” catalogued OLH and NOTH designs —n = 33 
and k = 11—to illustrate the minimal resource investment for our methodology. 
Generating 1,000 RLHs, we create an FRLH with equal to 0.0307 and a condition 

number of 1.156, which meets our NOLH criteria. We complete this process in less than 
30 seconds on a standard desktop processor. Applying VMIN is not necessary to meet 
NOLH criteria for a 33 x 11 design matrix. 

As a test case, we develop a saturated design that does not adhere to OLH or 
NOLH dimensional constraints: n = 33 and k= 32 (see Appendix A). Approaches from 
Ang (2006), Cioppa (2002), and Ye (1998) do not develop designs to explore 32 
variables in so few runs. Ang (2006) requires 65 runs to address 32 variables. Cioppa 
(2002) requires a sample size of 513, while Ye (1998) requires an increase in the number 
of runs to 131,073, for m = 17. Steinberg and Lin (2006) recognize that their 
methodology faces limitations in design dimensions. A comparison of the methods to 
construct a design to explore 32 factors is in Table 20. The comparison shows the 
advantages of our new technique over these other construction methods. 


63 



Table 20. A comparison of different approaches to develop a design that can explore 
32 variables shows that our new S-NOLH design requires the least number of 
runs, and is therefore most efficient. Other approaches can theoretically develop 
experimental designs to explore 32 or more variables, but to the author’s 
knowledge, none are currently in the catalogue of OLH and NOTH designs. 


k = 32 

Ye 

Cioppa 

Ang 

Steinberg and Lin 

New 

Required n 

131,073 

513 

65 

64 

33 

Pmap 

0 

0 

0 

0 

0.0435 


In the process of developing an design, we also generate three new, 
individually unique designs for the same dimensions, each meeting orthogonality criteria. 
The best design has = 0.0435 . Two other designs have values of 0.0465 and 
0.0451. There are nine other new and different designs that do not meet NOTH criteria; 
they possess values of p^^^ between 0.05 and 0.06. Although other techniques can 

theoretically develop a design to explore 32 or more variables, such efforts (e.g., Ang, 
2006 and Steinberg & Lin, 2006) result in exactly one 
such design. 

Our recent discussion traces known design dimensions, but the flexibility of our 
technique results in previously unknown OLH and NOLH designs. We begin with 
Cioppa’s design as a start point. Adding a new column consisting of an ordered 

vector of 1, 2,..., 17, we apply VMIN. The result is a new orthogonal design, Ol^. A 
repetition of this process produces another completely orthogonal design, O,’. The 
author knows of no other method that produces an OLH of this dimension. Our extension 
of this process does not produce completely orthogonal designs, but meets nearly 
orthogonality criteria for an design. Beyond 13 factors, we apply our full 

methodology to produce an NlJ design with p^^^ = 0.0466. We also initiate a new 
FRLH, discarding Cioppa’s O!/, and create a fully saturated NOLH, NH, with 
Pmap - 0-0491 as shown in Table 21. Although we do not use Cioppa’s original O]'^ as a 
basis, the design still retains good properties, such as space filling (see Appendix B). 
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Table 21. This is a new design from an original FRLH for 17 runs. We extend the 
number of faetors to explore until it is fully saturated. It allows an experimenter 
to explore 16 factors within 17 runs. The k:n ratio of 0.94 is large and makes the 

design a good screening plan. 
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We complete our examples with a design that does not follow current size 
conventions. We seek a design that explores 20 factors within 25 runs. The k:n ratio is 
0.80. There are no similar designs to use as starting templates. Neither n nor k is a factor 
of two. After initiating a new FRLH, we produce a new NOTH design, N^, 
Pmap =0.0439 within 10 minutes using VMIN. After iteratively adding columns we 
produce a final S-NOLH (), shown in Appendix A. An alternative for extending an 
A 23 to become an A 24 is to add a new row (for the new level 24*) to ^ 23 "^ and apply 
Florian’s method and then VMIN to produce an NOTH. Taking the new NOTH (now 
A 23 ) we add a new column and again apply Florian’s method and VMIN, respectively. 
The result is a new saturated NOTH for A 24 . The flexibility of this alternate process is 
an important attribute of this method. 

Table 22 compares a few design dimensions from different construction 
methodologies. It is evident that our new method of combining FRLH and mixed integer 
programming (FRLH-MIP) has the flexibility to create unique designs that cater to the 
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experimenter’s needs. As previously mentioned, Steinberg and Lin (2006) require 64 
runs to explore 32 factors, while FRLH-MIP uses 33 runs (Appendix A). This short 
discussion encompasses three unique design combinations for each of the given 
techniques. The new designs are saturated and further fill the OLH and NOTH library. 

Table 22. A summary of techniques from Cioppa, Ang, Steinberg and Lin, and our 
new method shows that the FRLH-MIP method is a viable option to extend the 
library of OLH and NOLH designs. A label of “NA” designates dimensions not 
currently available from the technique. 



Cioppa 

Ang 

Steinberg 
and Lin 

New 

n 

17 

17 

16 

17 

k 

7 

8 

12 

16 

Pmap 

0 

0 

0 

0.0490 






n 

33 

33 

NA 

33 

k 

11 

16 

NA 

32 

Pmap 

0.0234 

0 

NA 

0.0451 






n 

65 

65 

64 

64 

k 

16 

32 

56 

63 

Pmap 

0.0219 

0 

0 

0.0443 


E. SUMMARY 

Combining FRLHs with an optimization routine to construct new designs 
provides the freedom to create the variety of orthogonal and nearly orthogonal design 
dimensions that we have discussed. The capability to produce many different NOLH 
designs with the same dimensions introduces great possibilities in stacking and crossed 
designs. With a choice of different designs, possessing acceptable nonorthogonality 
traits, scientists can use other measures to discriminate among them. The , and 

A 32 designs in Appendix A are examples of this flexibility. Examining each design 
shows that they are not dependent on each other (i.e., the design is not just a 
one-column extension of ). Eliminating one column from ^ 3 ^, or two columns from 
A 32 ’ creates three different designs. In comparison with the effort that other 
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methods require to produee one design, the undertaking for our teehnique is relatively 
trivial. Coupled with its flexibility in design dimensions, our approaeh provides seientists 
with a new tool to fill mueh of the OLH and NOLH library. 
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V. MANAGING MIXED-FACTOR, MIXED-LEVEL 
EXPERIMENTS WITH A NEW DESIGN METHODOLOGY 

In accordance with the LHS eonstruet (MeKay et ah, 1979), OLH and NOLH 
designs assume that all variables are eontinuous. Frequently this assumption is false. 
Many experiments inelude a number of discrete variables that do not neeessarily have the 
same number of value levels. Converting (by rounding) the aetual values of these 
diserete variables onto a raw OLH or NOLH design, espeeially when there are a small 
number of design points, often results in an overall design with poor orthogonality 
properties. We designate these types of designs as mixed-faetor, mixed-level 
(MFML) experiments. 

MFML experiments diminish the advantages of OLH and NOLH designs. In this 
ehapter, we explore means to mitigate these disadvantages and restore near orthogonal 
properties. The foundation for ereating a nearly orthogonal MFML design is the set of 
eontinuous variables from whieh we produee an NOLH using our new teehniques. Our 
study exploits the dimensional flexibility of our new designs, by eombining them with 
staeking methods and intelligent applieation of proven design teehniques. The resulting 
MFML designs retain mueh of the orthogonality properties of the basie NOLH, thereby 
maintaining their utility. In praetiee, this approaeh has worked well for ereating an 
efficient design for a problematie set of faetors. 

A. THE BASE CONTINUOUS DESIGN 

The set of eontinuous variables in the experiment determines the base design and 
is the foundation of the MFML methodology. Integer variables with a large number of 
value levels may also be part of the base design. The complete set of variables in the 
base design provides the flexibility in our methodology. Because the number of runs for 
this subset of variables ean usually be set to any number n, the base design ean match the 
design for the set of diserete variables as needed. 

Previous ehapters describe the ease of generating an OLH or NOLH from a set of 
eontinuous variables with a eonfluenee of one or more teehniques. The lone “hard” 

eonstraint with this methodology is that n> k. However, some analysts may ehoose to 
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avoid saturated designs. Spaee-filling eharaeteristies may be an important feature for the 

k 

experiment. The analyst may also deeide that a design where — < 0.33 is more 

n 

appropriate, or desire a greater number of degrees of freedom. No matter the method for 
selecting n, it presents an initial restriction. We designate this initial restriction in the 
number of runs as n ,, where tij > k , and use it to guide constructing a separate design for 
the set of discrete variables. 

B. DISCRETE INTEGER VARIABLES AND STACKING METHODOLOGY 

The set of discrete variables presents difficulties for creating a LH design that 
possesses an acceptable . Often, discrete integer variables do not have the same 

number of value levels. In such cases where sample size is a constraint, there may be 
little chance for creating a complete balanced design for them (Box et ah, 1978; 
Satterthwaite, 1959). We first consider cases when there is a single discrete integer 
variable mixed with an experiment that predominantly contains continuous variables. 
Our method also handles subsets of discrete variables, each with differing numbers of 
value levels. A part of this process uses stacking methods—i.e., permuting the columns 
and appending additional rows or another design (Cioppa & Lucas, 2007). The full 
methodology includes stacking two or more designs, adding columns for binary factors 
through random permutations of {0, 1}, and crossing quantitative designs with designs 
for qualitative variables. The goal for systematically constructing an MFML experiment 
is to create a nearly orthogonal random balanced design (Satterthwaite, 1959). 

I. One Subset of Discrete Variables with a Common Value Level 

We examine the set of discrete variables and identify the value level, which we 
denote as f,., for each variable, i = l,2,...,w. If it is reasonable to use a common value 

level i (i.e., all discrete variables have the same number of value levels), it is much 
easier to generate a total MFML design with nearly orthogonal properties. Given that i 
is greater than the number of discrete variables, d, we apply similar techniques for 
creating an NOLH from continuous variables. From empirical study, cases in which 


70 



i>\0>d show that our methods often result in a design with an aeeeptable . For 

oases in whioh i <d ,we split the set of discrete variables into smaller subsets such that 
i> d is true for each new subset, permitting the application of previous techniques to 
create an NOLH. In the next section, a small example demonstrates the significance of 
establishing these constraints on £ . 

2. Addressing More Than One Subset of Discrete Variables 

For experiments containing several discrete variables with varying numbers of 
value levels, we group the discrete variables into subsets with corresponding common 
f's. Subject matter expertise and the analyst’s reasoning can simplify the number of 
subsets with which to work. The concept for developing a complete design with these 
disparate subsets of discrete variables is to incorporate each subset design as an LH into 
the overall design. Figure 10 illustrates this idea later in the section. 

We identify w subsets of discrete variables and designate the discrete subset 
designs as DV^,DV 2 ,...,DV^. Each subset has a corresponding number of value levels, 

£^,£ 2 ,■■■£„, from which we determine the least common multiple (LCM) and designate it 
as rij^y , which is the minimum total number of runs for all discrete variables in the 
design. For each , there exists an integer value h,., such that •b. = n^^y, / = 1,2,..., w, 
and determines the initial number of stacks for each DV.. We adjust the number of 
stacks to match Hj if rij < rij^y , or increase rij if > n^^y ■ 

The intent of creating a balanced design LH (Box et al., 1978; Satterthwaite, 
1959) drives our choice for grouping discrete variables. We contemplate fto determine 
whether it is possible to incorporate a smaller DV within a larger DV. For instance, we 
consider three sets of discrete variables with value levels 5, 8, and 10, thus 
LCM =40. We denote each design as DV^,DV 2 ,DV 2 , respectively. A savvy 
analyst could reasonably, if resources permit, decide to increase the number of value 
levels for DV 2 from 8 to 10, thereby creating a larger group (DVj) based on 10 value 
levels. This simplification makes a balanced design easier to obtain for a reasonable n. 
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The experimenter matehes two staeks of DV^ within every one DV^. We update LCM; 
n^y=LCM = 10 . 

To further elarify the MFML proeess, we establish that the base design of 
eontinuous variables requires n = 40 design points, based on the original LCM. We 
determine the final number of staeks for DV^ and DV^ based on n to be 8 and 4, 

respeetively. The four staeks of DV^ aet as the base stack for the eight staeks of DV^. A 
short program in R performs iterative random stacking of DV^. For each iteration of 
stacking, we append the new stack to the stacks of DV^and compute The 

experimenter can limit the number of iterations to a number of trials deemed reasonable 
or a threshold for an acceptable . 

A final generation of a base design from the remaining continuous variables 
results in an MFML design with the structure shown in Figure 10. We note that the 
flexibility of our new NOLH designs allows the base design to adjust to the discrete 
variable designs. For clarification, we rename DV^ as DV^ and DV^ as in 

the illustration. 
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DVs 
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DVs 
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DVs 


DVs 

DV,o 


DVs 



Figure 10. The structure of the stacked MFML design. 

3. Formalized Stacking Methodologies 

In this section, we discuss our stacking approach as part of the MFML 
methodology. Whereas stacking is sometimes applied to increase n, we use stacking as 
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an opportunity to improve for the subsets and overall set of discrete integer 

variables. A stacked design in which each design is based on randomly permuted 
columns of the original design can improve the of the complete design, such that 

-iPmnn) (Cioppa & Lucas, 2007). This method of stacking helps 

efforts to improve the overall design that includes the base design. 

Stacking also provides the freedom to mitigate assumptions that prove false. Our 
discussions in the previous two sections assume £.>d., i = \,2,...,w, where d. is the 
number of variables in the subset of discrete variables. Another assumption is that 
Hdv > kjjy , where kj^y is the total number of discrete variables, = Jj + + • • • + ^ 

and that n^^y > k . When these assumptions do not hold, we use stacking methods. 

To apply stacking methods, we first establish the final number of runs for the 
experiment, n, by comparing and : (1) If , then n = rij^y ; (2) If < tij , 

then we take the ceiling^ of the ratio of the and rij^y as a modification factor, such that 


n = 


‘Z)V 


xn. 


The number of stacks for each DV- in the overall MFML design is 


S: = b: X 


, recalling that is the initial number of stacks for each subset of discrete 


variables. We adjust h, based on the modification factor. 



, if this is possible. 


We leverage the new method for stacking from Cioppa & Lucas (2007) to reduce 
the nonorthogonality of the overall stacked design. We generate a DV design with a 
reasonable p^^ and permute the columns to create a “new” design—once for each 

required number of stacks, s, for the design. Each rearranged LH is not truly a new 
design since there is a one-for-one column match in the new and original design, albeit in 
different column order. However, the overall stacked design is different and an 
improvement in p^^ is evident (Cioppa & Lucas, 2007). We may permute the k 


^ The ceiling of the value x is the smallest integer value that is greater than or equal to x. 
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column s in any of k ! ways and choose s of them to ereate an overall staeked design. Put 


another way, we ean ehoose from any of 



staeked designs. We seleet the set of new 


designs based on a desire to reduee the of the staeked design. Another simple 
program in R allows a search for a set of “new” designs that will stack into an overall 
discrete integer design with an aeeeptable . After selecting the set of “new” designs 

to staek, we see that there are s ! ways of staeking them. Eaeh restaeked design from the 
same set of “new” designs results in the same overall p^^^. These findings provide a 

flexibility to reduee nonorthogonality inherent in a singe subset of discrete variables in 
MFML experiments. 

Laterally appending a staeked design for one subset of diserete variables to 
another subset’s diserete variables design may inerease p^^^ for the joined design, 

DVJoined ■ To minimize p^^ for the joined design, we hold one set of staeked designs as 
the base stack, , while randomly restaeking the subset of interest, . 

During random restaeking, maintains its p^^^, but its eorrelation with 

ehanges, thereby ehanging p^^ for . We continue restaeking until p^^^ 

for is aeeeptable or eeases to improve. Since p^^^ for the joined design will not 

be better than the worse p^^ of the two DVs, it is reasonable to set p^^^ from the worse 
of and as a threshold. The joined design is the new base stack. We 

ineorporate a new in the same manner until all diserete variable subsets are in 

the base staek. 

In the ease of a single integer variable, a simplistie approaeh reaps signifieant 
benefits. We randomly permute a veetor of the possible values of the variable; the veetor 
eonsists of eaeh value repeated exaetly the same number of times that there would have 
been stacks. For instance, a single variable with f = 4 levels and n = 20, results in a 

Vi 20 

veetor eonsisting of the values 1, 2, 3, and 4, eaeh repeated — = — = 5 times. 
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^ = (1,1,1,1,1,...,4,4,4,4,4). The analyst generates a new random permutation of the 

vector, appends it to , measures , and compares it with a specified correlation 

criterion. A maximum number of iterations may also be a criterion. A modification of 
the R program that handles restacking helps in this process. We find the single integer 
variable approach useful when there are few value levels (f <10) and applicable when 
discrete variables cannot be grouped. 

Our overall design strategy includes known design techniques. Centered designs 
and two-level factorial designs are often part of MFML experiments. Additionally, a 
library of orthogonal arrays (Sloane, 2006) offers a number of catalogued orthogonal 
arrays (OAs) that may match the requirements for an MFML experiment. For readers 
interested in more details about OAs, Hedayat et al. (1999) have a comprehensive 
discussion of theory and applications. A group of these OAs already contains designs 
with factors that have varying numbers of value levels. For instance, the library lists an 
MA. 12.2.4.3.1 design. The nomenclature represents a mixed orthogonal array (MA) that 
has 12 runs, 4 two-level factors, and 1 three-level factor. These designs are very specific 
to the types and mixture of factors they can address. 

Many cases may occur for subsets of discrete variables. We do not attempt to 
enumerate them and leave it to the analyst to best simplify the experiment, while using 
our guidance for combining design methods. The goal for this methodology is to develop 
a design that can explore the main effects of each factor. 

Once the final base stack design is complete, the experimenter can apply our new 
techniques or other known methods to create the base design from the set of continuous 
variables. The total number of runs is still n. The number of variables for the base 
design is = k-k^^y . We expect to benefit from the flexibility of our new designs if 

known OLH and NOTH designs cannot accommodate k^^y . 

C. COMBINED AND STACKING METHODOLOGIES 

This section describes our procedure for logically combining stacking 
methodologies and proven design schemes to produce designs with acceptable 
nonorthogonality measures. Because there are many possible types of variables and 
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different numbers of value levels that ean oeeur, we diseuss our general methodology for 
a fairly eomplex ease. A eomplex ease involves a design that has eontinuous variables, 
diserete variables—eaeh set with a different number of value levels and/or single DV that 
has a number of value levels that matehes neither of the DVs’ sets. We inelude a flow 
ehart as a template for the MFML methodology. Deseription of a student thesis that 
employed this proeess illustrates its utility. 


1. Steps for Creating a Mixed Factor, Mixed Level Design 

This seetion lists the general steps to guide the ereation of an MFML design with 
aeeeptable nonorthogonality. In their present form, these steps are helpful, but require 
more speeifleity. Experienee and reason are required to determine how to best employ 
the methodology for eaeh new ease. 

Step 1: Determine the sets of diserete and eontinuous variables. Set tij to meet 
requirements for all variables, sueh that n,>k + \. 

Step 2: Group the set of DVs into subsets that have eommon numbers of value levels 
(f;). Simplify subsets and as logie dietates. 

Step 3: Split subsets of DVs if i^< d. in the subset. 

Step 4: Determine LCM =n^y from the set of £., i = \,...,w. Determine h,., V/. 


Step 5: Adjust the total number of runs to be n= —— x . 

fl 

f^DV 

Step 6: Use new and past teehniques to ereate the base design with an aeeeptable . 

n 

Step 7: Determine the number of staeks required for eaeh set of DVs: 5, =—, V/ or 


5. =h,. X —— , Vi. 

^DV 

Step 8: Create a staeked design for eaeh set of DVs, restaek, and append until the overall 
design meets the threshold for . 


It is eommon for p^^ of the staeked DVs to dietate p^^^ for the overall design. 
At times, the experimenter may wish to staek the base design. 
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2. Design Best Practices within the Mixed Factor, Mixed Level Process 

Intertwining proven design sehemes with the MFML methodology is integral to 
our overall proeess. Full and fraetional faetorial designs, eentered designs, and simple 
LHs are a few of the proven teehniques that also play a role in ereating a full MFML 
design. Although we have eoneentrated on numerie variables, many experiments inelude 
qualitative variables. Binary and eategorieal variables are often essential to the seientifie 
method. 

Reeall that a major objeetive in the MFML proeess is for a balaneed design. 
Understanding that nonnumeric variables play no part in computing correlation, it is still 
important for our technique to incorporate these variables in the MFML. We can cross 
orthogonal designs derived from our techniques with other designs. The flexibility that 
we offer in design dimensions eases the ability to match the dimensions of a balanced 
nonnumeric design. It is not uncommon to have five binary variables in an experiment. 
A full factorial design requires 2^ = 32 runs. In the absence of a catalogued orthogonal 
(or nearly orthogonal) design, we can offer a 32-run design from our technique with the 
set of numeric variables. We can also incorporate larger resolution III or resolution V 
fractional factorial designs, such as those in Sanchez & Sanchez (2005). 

D. EXPLORING FIRST RESPONSE TO A BOMB ATTACK USING A 

MIXED-FACTOR, MIXED-LEVEL DESIGN 

This section demonstrates the utility of the MFML. We discuss the general 
methodology in a fairly complex case involving a graduate student’s thesis. The study is 
ideal for illustration because it explores a mixture of qualitative and quantitative 
variables, which include integer and continuous variables. 

I. The Thesis Prohlem 

Major Jon Roginski’s thesis (2006), “Simulating Emergency Response Using 
Multi-Agent Simulation,” studies first response to a bomb attack in Baltimore’s Inner 
Harbor area (see Figure 11). It leverages simulation techniques to determine the efficient 
use of resources for the Department of Homeland Security (DHS) within the National 
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Exercise Program. DHS’s previous efforts to employ simulations have cost tens of 
millions of dollars, with an unsatisfactory return on investment (Roginski, 2006). 




Figure 11. Baltimore harbor vignette terrain: actual photo and within 

multiagent simulation (Roginski, 2006). 


Major Roginski adapts an existing organizational learning process and integrates 
low- and high-resolution simulation to provide decision support. His study presents the 
potential benefits of low-resolution simulation, using efficient experimental design and 
high-performance computing. He examines a 48-dimensional response surface. These 
factors are a mixture of integer and DVs, as well as three two-level variables and two 
three-level variables. 

This is quite a design challenge. Without the use of an efficient design. 
Major Roginski calculates that a traditional gridded design for 48 factors and 30 

24 

replications requires 1.60 x 10 CPU hours, or approximately 116 trillion times the age 
of the universe, even with the help of the Maui High Performance Computer Center 
(MHPCC). Using an efficient MFML design and the MHPCC, all experiments are 
completed in less than three weeks or 156 CPU centuries (Roginski, 2006). 
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2 . 


Applying the Mixed-Factor, Mixed-Level Methodology to a 
Thesis Question 


Distilling this problem requires an examination of the variables in the experiment. 
The analyst considers 48 variables, three of which are color or pattern indicators in the 
simulation. These three variables are not part of the design. Five other variables are 
discrete: three have two-levels and two have three-levels. Sloane (2006) does offer some 
alternatives for this design, but the designs in the orthogonal array library are fairly 
restrictive in the number of runs and number of mixed factors they can handle. 

Table 23 shows 40 of the variables that are the focus of design development. The 
table shows that many of the variables have only eight levels. Our previous discussions 
in this chapter state that variables with less than ten levels are problematic. A number of 
the variables have 144 or more levels, which may be considered as continuous. Lastly, 
four variables have 11 levels. Table 23 separates the different number of value levels. 
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Table 23. Forty of the variables in Major Roginski’s experiment are displayed. The 
variables are all integers, but some have enough value levels to be eonsidered 
as eontinuous, as the shaded rows indieate. 


Variable Name 

Min Value 

Max Value 

# Value Levels 

Type 

Num Agl 

0 

7 

8 

Integer 

Num Ag2 

0 

7 

8 

Integer 

Num Ag3 

0 

7 

8 

Integer 

Num Ag4 

0 

7 

8 

Integer 

Num Gun 

0 

7 

8 

Integer 

Num EMT SJ 

0 

7 

8 

Integer 

Num EMT MM 

0 

7 

8 

Integer 

Num ESUl 

0 

7 

8 

Integer 

Num ESU2 

0 

7 

8 

Integer 

Num ESU3 

0 

7 

8 

Integer 

Num ESU4 

0 

7 

8 

Integer 

Num ESU5 

0 

7 

8 

Integer 

Num Folll 

0 

7 

8 

Integer 

Num F 0 II 2 

0 

7 

8 

Integer 

Num F 0 II 3 

0 

7 

8 

Integer 

Num Traffl 

0 

7 

8 

Integer 

Num Traff2 

0 

7 

8 

Integer 

Num TrafO 

0 

7 

8 

Integer 

Num Traff4 

0 

7 

8 

Integer 

Num TraffS 

0 

7 

8 

Integer 

Num Traff6 

0 

7 

8 

Integer 

Num Traff7 

0 

7 

8 

Integer 

Num TraffS 

0 

7 

8 

Integer 

Num Traff9 

0 

7 

8 

Integer 

Num TrafflO 

0 

7 

8 

Integer 

Num Traffll 

0 

7 

8 

Integer 

Num Traffl2 

0 

7 

8 

Integer 

Desire Civ 

1 

144 

144 

Integer~Continuous 

Color 

1 

144 

144 

Integer~Continuous 

Marksman Pol 

1 

144 

144 

Integer~Continuous 

Prob Com 

1 

144 

144 

lnteger~Continuous 

Vul Ag 

1 

144 

144 

Integer~Continuous 

Vul Gun 

1 

144 

144 

Integer~Continuous 

Marksman Ag 

1 

144 

144 

Integer~Continuous 

Marksman Gun 

1 

144 

144 

Integer~Continuous 

Num Civ 

0 

150 

151 

Integer~Continuous 

Num SWAT 

5 

15 

11 

Integer 

Attr in Orders 

0 

10 

11 

Integer 

Attr in Ag 

0 

10 

11 

Integer 

EffBomb 

0 

10 

11 

Integer 
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There are a few approaehes that the experiment may use as a modifieation of the 
steps listed previously. We aeknowledge that, at the time of ereating this design, VMIN 
had not yet been fully developed. Beeause 27 of the variables have only eight value 
levels, use of the single integer method is a useful seheme. Nine of the variables have 
144 or more value levels. The experimenter may eonsider these variables to have a 
eommon I of 144. We easily develop an NOTH with previous teehniques. 

The remaining four variables have 11 value levels. Beeause l> d for this subset, 
ereation of a 4 x n NOTH design is possible, followed with a staeking approaeh. The 
overall design will require at least 144 design points. Therefore, it is possible to develop 
an FRLH design for all these numerie variables. A large number of design points 
mitigates round off issues and eonverts the aetual values of DVs into a nearly orthogonal 
design matrix. 

The deeision to treat 40 of the integer variables as eontinuous variables, and 
develop a base design from them, involves the LCM in the two-level and three-level 
variables in the experiment. Three two-level variables require an eight-run design, whieh 
ean be orthogonal. A full-faetorial design for two three-level variables requires nine 
runs. The LCM for these two sets of DVs is 72. Beeause a number of the integer 
variables have 144 levels, two staeks of the eombined DV sets matehes the base design. 

To illustrate. Table 24 has the eight design points of a full-faetorial design for the 
two-level variables. They mateh with one design point of the two, three-level variables 
(diagonal shading). Sinee there are nine design points for the two, three-level variables, 
these eight design points for the two-level, full-faetorial design is repeated for eaeh 
unique design point, whieh results in 72 runs. 
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Table 24. Excerpt of the DV subset design for the MFML, containing 24 of the 144 
total design points, shows how a full, two-level factorial design for three binary 
variables aligns with three design points of two discrete variables, each with three 
value levels. We shade sections of the table to emphasize the match between a 
single design point for the discrete, non-binary variables and a full two-level 
factorial for the binary variables. 


ESU 

FO Police 

Traffic 

Time OP 

Panic 

0 

0 

0 

1 

1 

0 

0 

1 


1 

0 

1 

0 


1 

0 

1 

1 


1 

1 

0 

0 


1 

1 

0 

1 


1 

1 

1 

0 


1 

1 

1 

1 


1 

0 

0 

0 

1 

2 

(1 



1 

2 

(1 


0 

1 

2 

0 



1 

2 

1 


0 

1 

2 

1 



1 

2 

1 


(1 

1 

2 

1 

1 


1 

2 

0 

0 

ij 


3 

0 

0 

1 


3 

0 

1 

0 


3 

0 

1 

1 


3 

1 

0 

0 


3 

1 

0 

1 


3 

1 

1 

0 


3 

1 

1 

1 

1 

3 


Stacking this set of 72 design points atop one another results in 144 total design 
points, which correspond with the number of runs in the base design. Using an iterative 
application of Florian’s method with the remaining 40 variables (assuming all are 
continuous) and n = 144 produces a base design with =0.00766. When the actual 

values of the integer variables treated as continuous are converted (by rounding) into the 
design, the resulting nonorthogonality measure is = 0.0593. Despite the many 
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variables with small numbers of value levels, the nonorthogonality measure remains 
relatively small. The eomplete n = 144, k = A5 design is in Appendix C. 

Examination of this design reveals advantages of the MFML approaeh. An 
NOLH design using Cioppa’s (2002) method to explore 45 faetors requires m = 9, whieh 
eorresponds with n = 1,025 and k = 46. This is if all variables are eonsidered eontinuous. 
If the five qualitative variables are separated, an NOLH design to examine the remaining 
40 variables still requires 1,025 runs sinee m = 8 only results in A: = 37 and n = 513. A 
eonventional erossed design approaeh to eombine the qualitative variables with the 
NOLH eould lead to 73,800 runs. Furthermore, we are unaware of the existenee of either 
a 40- or 45-variable NOLH design with so few design points. Meanwhile, our methods 
have developed an FRLH design that explores up to 172 variables within 193 runs (Nj '72 ) 

and possesses a nonorthogonality measure of 0.0275. The MFML design reduees the 
number of runs in the NOLH by more than 85%, while preserving enough of the 
orthogonality properties of the base design to gain advantages in the analysis of the 
simulation output. 

Major Roginski’s study suoeessfully examines all 48 faetors using this design. 
His analysis of the data from the experimental design yields major findings, undiseovered 
from previous studies: 

• The most important faetor in aehieving sueeess in erisis mitigation is the 
effeetiveness of the poliee. 

• If a poliee foree is not well trained, they may aehieve greater sueeess by 
being less persistent with individuals. 

• Well established, well exeeuted standard operating proeedures (SOPs) 
may play a more important role in first response operations than 
interageney eommunieation. 

• There is a diminishing return after a eertain level of first responder 
training. It may be more effeetive to leverage resourees elsewhere after 
reaehing that level. 

The ability to examine faetors in isolation using the MFML design eontributes to 
the validity and signifieanee of these findings (Roginski, 2006). 
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E. SUMMARY 


MFML experiments are difficult propositions for analysts. They are typically 
problematic, even when using orthogonal or nearly orthogonal designs. Conversion of 
actual values onto the raw OLH and NOTH designs often result in poor orthogonality 
properties, reducing the advantages of these efficient designs. 

Our methodology produces a balanced MFML design that exhibits near 
orthogonal properties that benefit analysts. Using guidelines to partition and shape the 
subsets of DVs and stacking techniques, we apply proven design schemes to develop an 
overall balanced design. The base design consists of continuous variables, or integer 
variables with many value levels. The flexibility of our previous techniques permits 
manipulation of the base design to conform to the needs of set of DVs. 

A new methodology for stacking experimental designs is an important feature of 
the overall process for creating MFML designs. Random column permutation of an 
original NOLH design does not create a new design, but randomly restacking these 
column-permuted designs generally results in a completely new overall design. 
Experience and communication with the customer, as well as knowledge about the 
experiment of interest, directs the analyst to logically apply this methodology. A 
flowchart or list of steps cannot fully capture this art. 

A use-case demonstrates the utility of the MFML methodology. Major Jon 
Roginski’s study (2006) involves 30 replications of a design consisting of 45 continuous 
and integer DVs with varying numbers of value levels, as well as simulation-specific 
factors. Roginski (2006) estimates that if a full factorial design were used, the required 
time to complete these experimental runs approximately equals 116 trillion times the age 
of the universe. Previous NOLH designs by Cioppa (2002) to explore just 45 factors 
require m = 9, which corresponds to n = 1,025, assuming all continuous variables. A 
conventional crossed design approach to combine the qualitative variables with the 
NOLH could lead to 73,800 runs. Our new methodology creates an MFML design with 
144 runs to explore 45 variables. By using the computing poser of the MHPCC along 
with the efficiency of the nearly orthogonal MFML design. Major Roginski’s experiment 
requires less than three weeks or 156 CPU centuries (Roginski, 2006). 
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VI. THE COMPLETE ELEXIBLE OLH/NOLH METHODOLOGY: 
EXPLORING SOLDIER-LEVEL NETWORKS 


We implement our new designs to support the United States Army Training and 
Doetrine Command Analysis Center in a study of network-eentrie warfare. Our design 
helps assess the impaet that Soldier-level, network-enabled eapabilities have on a truek 
terminal’s eargo operations as part of a Joint Foree sustainment base. The researeh 
studies eurrent joint distribution operations and the role that transportation operations 
play in the joint eontext. The system of interest is the operational eoneept of the 
Centralized Reeeiving and Shipping Point (CRSP) and its organization. 

A. PROBLEM STATEMENT 

Major Franeiseo Baez, United States Army, (2008) examines up to 20 faetors of 
mixed types with mixed numbers of value levels to answer the following 
researeh questions; 

• Whieh network-enabled eapability gaps exist in the exeeution of 
transportation Soldiers performing terminal eargo operations tasks, under 
the identified eonditions, to the identified performanee standards? 

• Whieh distribution struetures and types of network-enabled eapabilities 
allow transportation Soldiers to aeeomplish their task to speeified 
standards, under given eonditions? 

• Are the network-enabled eapabilities eurrently available to individual 
transportation Soldiers? 

• What is the operational impaet of leaving the gap unfilled? 

This study foeuses on three dissimilar eommunieations network topologies that 
inelude In-Transit Visibility (ITV), networked eommunieations, and information systems 
that provide network-wide visibility of node and mode status in a shared Logisties 
Common Operating Pieture (LCOP). Operational seenarios from subjeet matter experts 
will employ these networks in a simulation. 

Faetors eome from eoneept speeifie attributes doeuments (JCS, 2005). The study 
eonsiders ITV-availability, ITV-aeeuraey, LCOP-update rates, probability of 
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communications, latency, and communication relay eapability as parameters that 
influenee network eapabilities. Noise faetors sueh as resourees available, eonvoys per 
hour, and eonvoy mix oases examine aspeots of network-enabled eapabilities on a 
broader basis. 

The results from this effort oan support or refute the findings from the funotional 
needs analysis in the Joint Capabilities Integration and Development System. 
Reoommendations from this researoh may also serve to shape future eapabilities 
requirements. Most importantly, it will benefit Soldiers who operate from fixed-based 
faoilities in theaters of operation. 

B. DESIGN OF EXPERIMENTS AND SIMULATION RUNS 

This simulation study uses modeling and simulation and an effieient experimental 
design. Major Baez’ study requires an MFML design for this eomputer experiment. His 
simulation ineludes 1 eategorieal variable, 5 binary variables, and 14 numerie variables 
(integer and eontinuous) with different numbers of value levels. Converting the aetual 
values of diserete variables onto a raw NOTH, results in an overall design with poor 
orthogonality properties. We ereate a base design using the methods of Chapter V (see 
also Hernandez, 2008b) for one integer variable with 16 levels and 12 eontinuous 
variables, using a subset of the eolumns from the n = 16, k = 15 saturated design shown 
in Table 25. We use an MFML design that eonsists of this new S-NOLH design, 
eombining them with staeking methods, and intelligently applying proven design 
teehniques. 
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Table 25. A saturated design for n = 16 and = 15 is the foundation for ereating a 
base design for an MFML design. Its eorrelation measure is 0.0471, with a 

eondition number of 1.319. 



kl 

k2 

k3 

k4 

k5 

k6 

k7 

k8 

k9 

klO 

kll 

kl2 

kl3 

kl4 

kl5 

nl 

10 

8 

13 

6 

16 

8 

15 

16 

14 

3 

12 

2 

13 

7 

9 

n2 

13 

12 

4 

8 

13 

15 

3 

4 

9 

4 

3 

3 

4 

2 

13 

n3 

7 

6 

12 

1 

2 

11 

4 

5 

11 

5 

16 

4 

3 

12 

5 

n4 

9 

16 

6 

4 

12 

5 

8 

14 

5 

6 

10 

15 

2 

15 

12 

n5 

11 

10 

8 

5 

15 

1 

7 

2 

15 

16 

7 

9 

9 

9 

2 

n6 

14 

4 

1 

10 

3 

7 

12 

6 

16 

7 

11 

14 

12 

10 

16 

n7 

6 

9 

14 

16 

11 

3 

9 

1 

3 

8 

15 

6 

7 

8 

15 

n8 

1 

13 

16 

7 

5 

10 

5 

7 

12 

9 

1 

10 

16 

11 

14 

n9 

12 

7 

10 

2 

1 

2 

13 

12 

1 

13 

4 

5 

8 

3 

11 

nlO 

16 

14 

11 

13 

6 

12 

2 

13 

6 

11 

14 

13 

14 

4 

4 

nil 

2 

3 

5 

3 

14 

16 

10 

8 

4 

15 

13 

12 

11 

5 

10 

nl2 

3 

2 

9 

11 

9 

4 

6 

11 

10 

1 

5 

16 

6 

1 

3 

nl3 

15 

1 

15 

12 

10 

14 

11 

9 

7 

10 

2 

11 

5 

16 

7 

nl4 

5 

5 

2 

14 

8 

6 

1 

15 

8 

12 

8 

1 

10 

14 

8 

nl5 

4 

15 

7 

15 

4 

13 

16 

10 

13 

14 

9 

8 

1 

6 

6 

nl6 

8 

11 

3 

9 

7 

9 

14 

3 

2 

2 

6 

7 

15 

13 

1 


We seleet any subset of 13 column s and stack it, building a 32-run by 13-factor 
base design to match the full factorial design (2^ = 32 runs) of five binary variables. 
Next, we cross the resulting design with the three-level categorical variable to create a 
96-run design. Randomly assigning value levels for the three-level integer variable to 
match the overall design creates an NOTH. The final MFML design contains 96 runs to 
address 20 factors in one operational scenario (see Appendix D). It retains the raw 
design’s nonorthogonality measure of = 0.0471. 

We remark that this design was not available when Major Baez did the production 
runs for his thesis. As an alternative design. Major Baez assigns values for the binary and 
categorical variables within the 257-design point design, since it can handle up to 29 
variables (Sanchez, 2006). The complete number of design points to examine three 
different scenarios is 771. The experiment involves 10 replications for each scenario for 
each of the 257 design points, for a total of 7,710 computational experiments. This is far 
less than the computational effort that would be required for 10 replications of a 257- 
design point NOTH crossed with the full, two-level factorial design, but less efficient 
than the MFML design. We note that after rounding to the actual values, the overall 
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nonorthogonality measure from the eatalogued NOLH design is =0.0962—more 

than twiee that of the MFML, while still requiring nearly three times the number of runs. 

Baez (2008) exeeutes the MFML experiment with 10 replieations and 96 design 
points (Hernandez et ah, 2008) for eaeh of the three eommunieations seenarios, totaling 
2,880 eomputational experiments to serve as a validation experiment. 

C. ANALYSIS 

Analysis of the simulation output data was eondueted primarily through the JMP 
Statistieal Diseovery Software (2004), a produet of the Statistieal Analysis Software 
institute. Major Baez ehose JMP (2008) for its data visualization features that allow the 
user to interaetively investigate data, and to refine and understand the analysis results in a 
dynamieally-linked spreadsheet and graphieal environment (Baez, 2008). 

One analysis approaeh was eonstruetion partition trees, whieh we will use to 
illustrate the utility of our new designs. Partition trees reeursively split the data from the 
simulation into homogeneous subsets in aeeordanee with the relationship between the 
response variable and the predietors. Eaeh split eonsiders all possible euts or groupings, 
given the eurrent state of the tree to seleet a partition with the largest likelihood-ratio Chi- 
square statistie. A threshold of 5% ehange in R from the previous split determines when 
to split the data set (Baez, 2008). This method provides insights about the most 
signifieant faetors influeneing the response variables. 

Major Baez (2008) foeuses his study on eomparing the three different network 
struetures and identifying those faetors that have the greatest impaet on eaeh strueture’s 
performanee, as well as differenees in overall performanee for the three struetures. An 
analysis of the mean time in the CRSP is only interpreted as the average time to proeess 
the speeified number of eonvoys in the eontext of the seenario, and not neeessarily the 
“typieal” time for a eonvoy. 

The following plots from the larger design are eonfirmed with our more effieient 
design. They demonstrate interesting results that eonfirm the eomplexity of the system 
and point to the need for further analysis. 
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The plots from JMP in Figure 12 show the utility of LH designs (Baez, 2008). 
The graphies present the time in the CRSP as the f' convoy appears. They reveal 
interesting results from the different scenario settings. Design Points A and B plots show 
two scenarios with an arrival rate of one convoy per hour. Design Point A has low traffic 
intensity and Design Point B has somewhat high traffic intensity. The plot for Design 
Point A indicates that after a certain point the time in CRSP begins decreasing slightly, 
but steadily until the terminating event. The plot for Design Point B outlines a supply 
system that is unable to handle the process for this given combination of factor settings. 
The time in CRSP continues to increase, indicating that very large queues are building 
and each successive convoy takes longer to process than the last. All three networks 
exhibit very consistent behavior (Baez, 2008). 



Figure 12. Time in CRSP by convoy number per network type when the rate of convoy 
arrival is two convoys per hour (Baez, 2008). 

The first branch of the partition tree in Figure 13 reveals that traffic intensity and 
ITV-Available have the greatest influence on the mean time in CRSP. When traffic 
intensity is near zero there is little queuing in the system, while traffic intensity greater 
than 1.0 results in substantial queuing in the system (Baez, 2008). Furthermore, the mean 
time in CRSP improves with timely and reliable ITV data, even when traffic intensity 
nears 1.0. The partition tree clearly identifies that these two factors have the greatest 
influence on the mean response and is the framework for further analysis. 
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Figure 13. Partition tree for mean time in CRSP, hierarehieal network strueture 
(Baez, 2008) 

Major Baez also applies multiple linear regression analysis. The multiple 
regression model, as the approximating funetion for the true funetional relationship 
between the response and regressor variables, examines the behavior of variables of 
interest (Montgomery et ah, 2001). Baez uses a stepwise linear regression method to fit 
regression metamodels for the mean time in CRSP as a funetion of main effeets, 
quadratie effeets, and two-way interaetions. The stepwise regression eontrol in IMP also 
helped identify the most influential faetors. 

The final regression metamodel is in Figure 14. It yields an R of 0.77 and 
eontains two main effeet terms and one interaetion term. Other models that are 
eonsidered inelude additional terms sueh as other main effeets, interaetions, and quadratie 
effeets. However, these additional terms explain only 1% more of the variability. 
Therefore, the parsimonious model is seleeted. The results suggest that traffie intensity 

and ITV-Available are ranked as the two most influential faetors. Traffie intensity is the 
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dominant factor, as indicated by a large |t-ratio|. These results serve to reinforee and 
eomplement those findings in the partition tree. 


Whole Model 
Summary of Fit 

0.768786 
0.766044 
123.8509 
204.0588 
257 

Analysis of Variance 


RSquare 

RSquare Adj 

Root Mean Square Error 

Mean of Response 

Observations (or Sum Wgts) 


Source 

DF 

Sum of 
Squares 

Mean Square 

F Ratio 

Model 

3 

12903598 

4301199 

280.4084 

Error 

253 

3880781 

15339 

Prob > F 

C. Total 

256 

16784379 


<.0001* 


Parameter Estimates 


Term 

Estimate 

Std Error 

t Ratio 

Prob> |t| 

Intercept 

201.08553 

16.68101 

12.05 

<.0001* 

ITV_Available 

-311.4838 

26.82984 

-11.61 

<.0001* 

traffic intensity 

389.47486 

15.0165 

25.94 

<.0001* 

(ITV_Avaiiabie-0.50265)*(traffic intensity-0.41024) 

-248.2801 

48.32956 

-5.14 

<.0001* 


Figure 14. Regression metamodel for mean time in CRSP, hierarehieal 
network strueture (Baez, 2008) 


The findings from the 96-run MFML design are similar and reinforee the larger 
design. In validating the larger design, the MFML proves itself as a viable exploratory 
tool that ean save the experimenter time and money. In this ease, the MFML would have 
saved the experimenter two-thirds of the resourees expended for the full design. 

D. WIDESPREAD UTILITY OF NEW NOLH DESIGNS 

In addition to Major Baez, over a dozen thesis students have used some form of 
FRLH, NOLH, S-NOLH, and MFML designs to eomplete their studies. 
Captain Joshua Ang (2006), Singapore Army, used FRLH designs to understand their 
utility and identify areas to improve the library of OLH and NOLH designs. 
Major Chris Miehel (2006), United States Marine Corps, used earlier versions of the 
FRLH to find more effieient designs to evaluate the Marine Corp’s Artillery Triad. 
Major Jon Alt (2006), United States Army, explored taeties, teehniques, and proeedures 
that troops in small eombat units eould apply in response to the ehallenges of future 
eombat systems and their employment. Major Earl Riehardson (2006), United States 
Marine Corps, employed NOLHs to study sensitivities in the Infantry Warrior 
Simulation (IWARS). 
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VII. CONCLUSIONS AND RECOMMENDATIONS EOR 

EUTURE STUDY 


Immense eomputing power enables the simulation of inereasingly eomplex 
problems, thereby allowing analysts to greatly assist deeision makers in all seetors of 
soeiety. Studies in human behavior and biomimeties often use eomputer simulations that 
rely on effieient designs of experiments, adopting teehniques that allow analysts to 
“systematieally examine a broader range of possible innovations . . (Booker, 2005). 
The United States Marine Corps Warfighting Lab uses agent-based models in eomputer 
experiments to understand nontraditional eombat through the aggregate study of large 
numbers of nonlinear aetors (Ilaehinski, 1997). 

In partieular. Department of Defense faees many of the new ehallenges that 
Brown’s Grave New World (2003) uneovers. The impaet of teehnology on global 
seeurity, the importanee of nonmilitary faetors in military affairs, and the role that 
transnational aetors play in disrupting nation-states are only the beginning in a growing 
list of variables that deeision makers must eonsider—often making physieal 
experimentation infeasible. 

Analysts frequently turn to eomputational experimentation, sueh as eomputer 
simulation, to address the hundreds of input variables for the system(s) of interest. In so 
doing, analysts seareh for sehemas that gain the most information from experiments. 
Computational and monetary eosts for one experimental run push analysts to find 
effieient designs, espeeially when it is ill-advised to make assumptions about variables of 
interest. Earlier designs of experiments may not always be suited for eomputer 
experimentation. The reeent surge of methods to eonstruet effieient designs, most 
notably those with orthogonal properties (OLH and NOTH), offers analysts a new 
methodology for exploring extremely eomplex problems (Kleijnen et al., 2005). The 
prineipal shortfalls of these designs are their striet limitations in dimensionality and their 
seareity. Teehniques that overeome these restrietions will make orthogonal and nearly 
orthogonal designs an important elass of design for exploring large models. 
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Our new techniques to construct experimental designs provide the freedom to 
create a variety of OLH and NOLH designs for many new design dimensions. The 
flexibility of this methodology enables the construction of a design with the exact 
dimensions that an analyst requires. It replaces the awkwardness of fitting value levels in 
a catalogued design, by using each value level more than once when a variable contains 
fewer levels than runs. Additionally, our method possesses the ability to produce new 
designs in a matter of hours. 

Introducing a new family of S-NOLHs contributes to the discipline of design of 
experiments. The efficiency of examining the experimental space of k variables from a 
sample size, n = k + 1 in a LH offers great potential for simulation experiments. The 

fk'] 

ability to select any one of subsets of columns or exchange columns between subsets 

in an S-NOLH design gives analysts unprecedented flexibility. 

This new construction methodology fills much of the void in the OLH and NOLH 

library. Catalogued designs from Cioppa’s (2002) original work explore k = m + 

factors. The gap between any two sequentially-sized designs is m - 1 factors and grows 
as m increases. S-NOLH designs fill gaps between the number of factors, as well as the 
combinatorial voids created when also considering the sample size, n = 2’" + 1. For 
instance, a gap exists between O]'^ and Affwhen there is a need to address between 

8 and 10 factors for, at most, 17 runs. A saturated easily fills this gap. Currently, no 
OLH or NOLH designs exist between n = 17 and n = 33. We build a saturated 
(Appendix A) that does not conform to n or k restrictions that exist in previous OLH and 
NOLH construction methods. We have catalogued a number of saturated designs that 
address from 2 to 63 factors, with no more than 64 runs. More designs are available for 
the asking. 

One research area that requires further study is the performance of our new 

designs in the presence of interaction and quadratic terms. OLH and NOLH designs from 

Ye (1998), Cioppa (2002), and Ang (2006) retain their orthogonal properties in the 

presence of a single quadratic or two-factor interaction term. Our preliminary findings 
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show that RLH outperforms OLH and NOLH in terms of as the number of quadratic 

or two-factor interaction terms increase, but this requires further study. The impact of 
higher order interactions in an experiment is also worthy of examination. An 
understanding of thresholds related to the number and order of nonlinear terms that may 
possibly be in an experiment will guide analysts for using our designs. 

The success of VMIN for building new orthogonal or nearly orthogonal designs is 
a significant step for using optimization methods to construct experimental designs. The 
initial design that VMIN uses is important for quickly finding a solution, or in finding a 
solution at all. Development of a more comprehensive heuristic to initiate VMIN would 
be a major contribution to this area of research. Our current approach is to start with a 
new FRLH when VMIN finds difficulty in converging to a solution. A better method 
than rms for selecting the next column to optimize may also improve the algorithm. At 
times, VMIN remains on the same column throughout the optimization routine because 
one run of VMIN on the targeted column shows no improvement, thereby resulting in the 
same column with the greatest rms. A means to move the program to the next column 
should help VMIN’s performance. A process to automatically “skip” to the next column 
after no improvements in the selected column will enable VMIN to continue. This forced 
movement can in turn affect the optimization of the unimproved column when the routine 
eventually returns to it. Currently, we manually stop the program and target the column 
that has the next greatest rms. Although this process has proven effective, a more 
automated means would be very useful. 
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APPENDIX A. SATURATED NOLH DESIGNS 


Appendix A displays a number of saturated NOLH designs that we developed 
during the eourse of our study. Eaeh design is designated with a symbol per Cioppa 
(2002) and the nonorthogonality measure for the design. 
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< with =0.0499 
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First 32 columns of with = 0.0443 







Final 31 columns of with = 0.0443 




APPENDIX B. PAIRWISE PLOT OE EACTORS IN 


Appendix B displays a pairwise plot of the faetors in N^. The dispersion of 
points in eaeh pairwise plot shows reasonable spaee filling properties. 
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APPENDIX C. MAJOR ROGINSKI’S DESIGN 


This Appendix lists the MFML design matrix for Major Roginski’s experiment. 
The design is split into eight seetions. Eaeh seetion heading deseribes the design points 
and the faetors shown in the seetion. It also ineludes the nonorthogonality of the design. 
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Roginski MFML Section \ : n=\ thru 12,k=\ thru 11 with = 0.0593 . 
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Roginski MFML Section 3: n = 1 thru 72, k= 12 thru 22 with= 0.0593 . 
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Roginski MFML Section 4: n = 13 thru 144, k= 12 thru 22 with= 0.0593 . 


NumESU4 I NumESUS I Num FolM I Num Foll2 I Num Foll3 I NumTraffI I NumTraff2 I 
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Roginski MFML Section 5: n=\ thru 72, k = 23 thru 33 with = 0.0593 . 
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Roginski MFML Section 6: n = 73 thru 144, k = 23 thru 33 with p = 0.0593 . 


NumTraffS NumTraff4 NumTraffS NumTraffG NumTraff? NumTraffS NumTraffS Num Traffic NumTraffll NumTraff12 Desire Civ 
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Roginski MFML Section 8: n = 13 thru 144, A: = 34 thru 45 with= 0.0593 . 
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APPENDIX D 


MAJOR BAEZ’S MEML DESIGN 
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APPENDIX E. A p™ PREDICTIVE MODEL WITH 
QUADRATIC TERMS 

The final multiple linear regression model to predict for a given n and A: is a 
relatively simple expression: 

(= 0.0873 + 7.859n - 0.109A: -11.702n . 

It is very tractable in its representation of n and k. We see that as n increases and k 
remains constant the first term is dominant in the expression and the mean maximum 
absolute pairwise correlation decreases, as Owen (1994) theorized. However, when k 
also grows large p™ reduces less quickly because of the values in the second term. 

These results are reasonable since an increase in k increases the number of columns, 
thereby creating more possibility of a new column permutation that may have high 
correlation with one of the other columns. 

Analysis of this model shows that it is adequate for predicting the best from 

G200 trials, given n and k. Statistics for the residuals indicate that the model is 
acceptable for its ability to predict. The range of residuals, in comparison with the 
magnitude of the predicted values, is relatively small. 

Figure 15 displays curvature, suggesting that inclusion of quadratic terms may 
provide an improvement. 


125 




Figure 15. The plot for the residuals resulting from a G200 MLR shows curvature that 
indicates a need to include more complex terms in the regression model. 

We explore the impact of adding more complex terms in the equation to help the 
model’s performance. We extend the model from Chapter II to include quadratic terms 
for transformed n and transformed k. The result is the following eight-term equation with 
an = 0.9996 . 

( pZp j"" = 0.03054 + 0.03208*-0.100806+13.06842*n""'' - 68.38078* 

-30.12779*yf%-"'^+254.8917*yf‘'^n-^'^+17.93109*A:“"'^n""'^-254.8391*yf"'^n-"'^ 

The residual plot (Figure 16) for this new model continues to show curvature, but 
the sizes of the residuals are a full order of magnitude less that those from the more 
simple equation. 
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Figure 16. The residual plot for the new multiple linear regression model with quadratie 
terms still show curvature, but the sizes of the residuals are much smaller 
than from previous models. 

The trade-off with this more precise formula is an increase in difficulty in 
explaining the impacts of n and k, as well as a reduction in the ease in which an 
experimenter can use it. Although the residual analysis of this multiple linear regression 
model reveals that there is some curvature in the relationship, the amount of error in the 
predicted values is relatively small. Since the majority of experimenters would use this 
formula mostly as a guide to find the appropriate design dimensions and not itself, 
the simpler model serves our purposes. 
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